‘ 


ACTUARIAL A 
MODELS 


rae 


= VLADIMIR I. ROTAR 
eS 


rot Pre 


CHAPMAN & HALL BOOK 


ACTUARIAL 
MODELS 
‘The Mathematics 


of Insurance 
Second Edition 


This page intentionally left blank 


ACTUARIAL 
MODELS 
The Mathematics 


of Insurance 
Second Edition 


VLADIMIR I. ROTAR 


CRC Press 
Taylor & Francis Group 
Boca Raton London New York 


Taylor & Francis Group, an informa business 


A CHAPMAN & HALL BOOK 


CRC Press 

Taylor & Francis Group 

6000 Broken Sound Parkway NW, Suite 300 
Boca Raton, FL 33487-2742 


© 2015 by Taylor & Francis Group, LLC 
CRC Press is an imprint of Taylor & Francis Group, an Informa business 


No claim to original U.S. Government works 
Version Date: 20140609 


International Standard Book Number-13: 978-1-4822-2707-9 (eBook - PDF) 


This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been 
made to publish reliable data and information, but the author and publisher cannot assume responsibility for the valid- 
ity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright 
holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this 
form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may 
rectify in any future reprint. 


Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or uti- 
lized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopy- 
ing, microfilming, and recording, or in any information storage or retrieval system, without written permission from the 
publishers. 


For permission to photocopy or use material electronically from this work, please access www.copyright.com (http:// 
www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 
978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For 
organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. 


Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for 
identification and explanation without intent to infringe. 


Visit the Taylor & Francis Web site at 
http://www.taylorandfrancis.com 


and the CRC Press Web site at 
http://www.crcpress.com 


Preface 


As mentioned in the first edition, this book is not a monograph, it is a textbook. Its goal is to 
give a comprehensive exposition of the basic models of insurance processes. A supporting 
goal is to present some mathematical frameworks and methods used in Actuarial Modeling. 

The format of the book. The material is presented in the form of three nested “routes.” 
Route 1 contains the basic material designed for a one-semester course. This material is 
self-contained and has a moderate level of difficulty. 

Route 2 contains all of Route 1, offers a more complete exposition, and is suited for a 
two-semester course or self-study. It is slightly more challenging but should be approach- 
able for any reader familiar with primary concepts of calculus and linear algebra. 

Route 3 (more precisely, the part that is not included in Routes 1-2) is designed primarily 
for graduate study. 

The routes are explicitly designated. To assist in navigating the text, we use markers 
similar to road signs. We also provide the Table of Contents with similar signs, so the 
reader has a general outline of the book’s structure. 

Potential audience. The book is intended for a large audience: students, actuaries, math- 
ematicians, and anyone conducting research in areas related to the subject of insurance 
processes. Some parts of the book may be useful for studying economic and social models 
of a more general nature. The author believes that the main audience is students who are 
majoring in mathematics, statistics, economics or finance and who are taking courses in 
Actuarial Modeling. 

For the most part, this text is based on the course “Actuarial Mathematics” which the 
author has taught many times over the last fifteen years in the Department of Mathematics 
at the University of California at San Diego (UCSD) and in the Department of Mathematics 
and Statistics at San Diego State University (SDSU). 

Prerequisites are two semesters of calculus and one upper division course in probability 
for undergraduate students. There is no need for any specialized mathematical background 
that differs from the standard introductory topics. The only possible exception is the big 
O and little o notation which is not always introduced in standard Calculus courses. This 
simple and convenient notation is defined and illustrated by examples in two pages in the 
Appendix. 

To facilitate reading, the main text is preceded by an introductory Chapter 0 containing a 
digest of basic facts from Probability Theory and the Theory of Interest. Ideally, the reader 
will not have to refer to outside sources for background material; everything is under one 
cover and is presented in a uniform notation and style. 

More on the contents. Our aim is to give an explicit exposition of main ideas and basic 
mathematical models. Therefore, we sometimes leave details important merely from an 
economic point of view to courses in the economics of insurance. 

This also applies to counting and computation issues. The book contains many com- 


vi Preface 


putational examples, but these serve as illustrations of results rather than instruction on 
computational aspects of the theory. 

More specifically, the book includes examples and exercises on numerical calculations 
with use of Microsoft Excel. The goal here is not to teach numerical methods in insurance— 
this requires software more powerful than Excel—but to assist the reader in developing an 
appreciation of particular formulas and to demonstrate practical possibilities and restric- 
tions of different approaches under consideration. 

Mainly, this textbook contains the standard material taught in courses in Actuarial Mod- 
eling. Nevertheless, it also contains several topics which the author feels would make the 
material more modern or/and somewhat deeper. Cases in point are the modern theory of 
risk evaluation; a generalization of Arrow’s theorem; the classification of distributions with 
regard to light- and heavy-tails; the accuracy of normal and Poisson approximations, and 
consequently, a more accurate estimation of some characteristics of insurance processes; 
a sufficiently detailed description of cash flows in the Markov environment; a model with 
payments of dividends; a systematic presentation of reinsurance models; and the applica- 
tion of the martingale technique to Ruin Theory. 

Regarding the last topic, to make the use of this book possible for instructors and read- 
ers who do not wish to discuss martingales, the exposition of Ruin Theory is organized 
in a way that the knowledge of martingales is not required. This part of the exposition is 
self contained. However in an additional section, the reader who is familiar with martin- 
gales, will be able to enjoy short proofs and a unified representation for the discrete and 
continuous time cases. 

On exams of actuarial societies. The book can be used as a source for preparing for 
exams on actuarial models. The author purposefully reviewed the programs and problems 
of the corresponding exams of the the Society of Actuaries (SOA) and the Casualty Actu- 
arial Society (CAS). As a result, nearly all topics from these exams on actuarial models are 
included in Routes 1 or 2 of the book. 

To the same end, with the kind permission of the CAS, a significant number of problems 
given in the previous sessions of the CAS Exams were included in the book as examples. In 
the first edition, these problems concerned the 2003-2005 examinations. These problems 
are still relevant and by no means obsolete, so it made sense to include many of them again. 
However, new problems from the the 2011-2013 examinations have been also included. 

Certainly, anyone interested in taking actuarial exams should study the syllabus and prob- 
lems given in previous examinations. However, the author believes that upon the comple- 
tion of the text and its exercises, the preparation for these exams will be relatively effortless. 

It is also worthwhile to mention two topics from the actuarial exam syllabi, which are 
not included in this book. First, we do not discuss data analysis. This topic should be 
considered separately in a course in actuarial statistics. Secondly, although we touch on 
simulation questions, we do not discuss them in detail. In particular, some technical simu- 
lation methods were not included in the book. 


What is new in the second edition. During the years since the first edition was pub- 
lished, the book has been adopted as a text at a number of universities, and the author 
himself has taught courses in Actuarial Mathematics at both universities mentioned above 
using this book. The result has been great feedback from students, teaching assistants, 


Preface vii 


colleagues in lecturing, and from individual readers who reached out with questions. This 
feedback allowed to eliminate many typos and a few more substantial (though still minor) 
flaws. It also has provided better understanding of which material went smoothly for stu- 
dents and which material required a more detailed explanation or—in rare occasions—even 
had to be eliminated or presented in another way. 

All of this has led to significant editing and revamping of all chapters. This especially 
concerns Chapter 6 (in the first edition, Chapter 7), “Global Characteristics of the Surplus 
Process”. Chapter 3 on conditional expectation from the first edition has been shortened 
and the material has been moved to Chapter 0, “On Preliminary Facts from Probability and 
Interest. 

Another goal was to take into account some new results and current trends in teaching 
actuarial modeling (which may be seen in some new textbooks that were published during 
these years). In particular, pension fund modeling is becoming more topical, and thus a 
new chapter on pensions models has been added to the text. 

Overall, the author believes that these modifications have led to significant improvement, 
and the second edition provides a more robust and polished exposition of the material. 

Textbook web site. Possible additional remarks, more detailed answers to the exercises, 
and errata will be posted at http-//actuarialtextrotar.sdsu.edu. 


Acknowledgments 


My sincere and deep thanks go to my colleagues for useful discussions of various scien- 
tific and pedagogical questions relevant to this book. This concerns 

Caroline Bennet, San Diego State University, USA 

Eric Bieri, US Insurance Pricing, John Hancock Insurance Company, USA 

Paul Brock, San Diego State University, USA 

John Elwin, San Diego State University, USA 

Patrick Fitzsimmons, University of California at San Diego, USA 

Victor Korolev, Moscow State University, Russia 

Luis Latorre, an actuary, Madrid, Spain 

Jeffrey Liese, California Polytechnic State University, San Luis Obispo, USA 

R. Duncan Luce, University of California at Irvine, USA 

Donald Lutz, San Diego State University, USA 

Mukul Majumdar, Cornel University, USA 

Michael O'Sullivan, San Diego State University, USA 

Yosef Rinott, Hebrew University, Jerusalem, Israel 

Alexey Sholomitskii, The High Economic School, Russia 

Sergey Shorgin, The Institute for Problems of Informatics, RAS, Russia 

Alexander Slastnikov, The Central Economics and Mathematics Institute, RAS, Russia 

Lee Van de Wetering, San Diego State University, USA 

Hans Zwiesler, Ulm University, Germany 


I am thankful to the referees of the first and second editions as well for their useful 
remarks. 


viii Preface 


My special thanks go to Mark Dunster, Robert Grone, Collen Kelly, David Lesley, David 
Macky, Helen Noble, Eugene Pampiga, Steven Pierce, Peter Salamon, and Arthur Springer 
for help in English editing. 

I am also thankful to my former students Xin-Wei Du and Esteban Mansilla who, when 
taking courses on Actuarial Mathematics, read a draft of the corresponding chapters and 
also provided useful feedback. 

My thanks go also to Sarah Borg and Sara Zarei for their permission to use material from 
their master theses (references can be found in the corresponding sections). 

Sunil Nair is the editor of already the third book of mine, and it is a pleasure to ac- 
knowledge that I always felt comfortable with all matters concerning the preparation of the 
book. 

Additional thanks go to the editorial staff at Taylor & Francis and in particular to Shashi 
Kumar for help in the preparation of the final file. 

Last but not least, I am grateful to the Casualty Actuarial Society, and especially to 
Ms. Elizabeth Smith, for the kind permission to reprint some problems given in previous 
sessions of the CAS exams on actuarial models. 


V R. 


Contents 


Flag-signs in the margins designate a route: either Route 1 (which is entirely contained in 
Route 2), or Route 2 (which is entirely contained in Route 3), or Route 3; see the Preface 
and Introduction for a more detailed description of these routes. A new flag-sign is posted 
only at the moment of a route change. 


Introduction 


Chapter 0. 


Preliminary Facts from Probability and Interest 


1 Probability and Random Variables ...................-4,4 


1.1 
1.2 
1.3 


Sample space, events, probability measure. ............. 
Independence and conditional probabilities ............. 
Random variables, random vectors, and their distributions ..... 
1.3.1 Random variables ...........0...0 000004 
1.3.2 Random vectors ... 1... 0... 2. ee ee 
1.3.3 Cumulative distribution functions ............. 
1.3.4 Quanitiles: bie oo acho a ee Bee Bae oie es 
1.3.5 Mixtures of distributions ................. 


2 Expectatlony 2 atcs tog stan laden soe alee kes 468 ate ete eh a an he Dat 


2.1 
2.2 
2:3 


2.4 


2.5 
2.6 


3.1 


3.2 


Definitions.’ 2:4. 40% bob ive Beas oe ad Gee ee i ia 
Integration by parts and a formula for expectation. ......... 
Can we encounter an infinite expected value in models of real phe- 

NOMENA? ©.) Aaga bd oop aid doe ean ed ai wm beh gt dog A 


2.4.1 Variance and other moments ............... 
2.4.2 The Cauchy-Schwarz inequality. ............. 
2.4.3 Covariance and correlation ..............0.0. 


Inequalities for deviations ..................000. 
Linear transformations of r.v.’s. Normalization ........... 


3.1.1 The binomial distribution ...............00. 


3.1.2 The multinomial distribution ............... 
3.1.3 The geometric distribution. ..............0.. 
3.1.4 The negative binomial distribution ............ 


3.1.5 The Poisson distribution. ..............008. 


Contents 


3.2:2 The exponential distribution. ............... 33 
3.2.3 The [(gamma)-distribution ................ 34 
3.2.4 The normal distribution .................. 36 
Moment Generating Functions .............2....2.000004 37 
4.1 Laplace transform .. 2... e p aoa ee ee 37 
4.2 An example when a m.g.f. does not exist .............. 39 
4.3 The m.g.f.’s of basic distributions .................. 39 
4.3.1 The binomial distribution ................. 39 
4.3.2 The geometric and negative binomial distributions . . . . 39 
4.3.3 The Poisson distribution... ............... 40 
4.3.4 The uniform distribution... 2... .........004. 40 
4.3.5 The exponential and gamma distributions. ........ 40 
4.3.6 The normal distribution .................. 41 
4.4 The moment generating function and moments ........... 41 
4.5 Expansions form.g.f’s oeenn 0... 2. ha h e a a ee 42 
4.5.1 Taylor’s expansions form.g.f’s .............. 42 
4.5.2 Cumulants’ T cs enang ie ae einen Be 43 
Convergence of Random Variables and Distributions ............ 44 
Limit Theorems: s oo bec oe a a a ee a ee 47 
6.1 The Law of Large Numbers ..................02. 47 
6.2 The Central Limit Theorem ..................0.. 48 
Conditional Expectations. Conditioning ..................., 49 
7.1 Conditional expectation given a T.V. . . ooo 49 
7.1.1 The discretecase.. o o oe ee eae a na a a a 49 
71.2 The case of continuous distributions ........... 51 
72 Properties of conditional expectations ................ 54 
7.3 Conditioning and some useful formulas ............... 56 
7.3.1 A formula for variance ... aoaaa a 56 

7.3.2 More detailed representations of the formula for total ex- 
pectation ooie e tine e a eg eda d a Anaa ee 56 
7.4 Conditional expectation given a random vector ........... 58 
7.4.1 General definitions . . . . ooa a 58 
7.4.2 On conditioning in the multi-dimensional case ...... 59 
7.4.3 On the infinite-dimensional case ............. 60 
Elements of the Theory of Interest... 2... ..............0., 61 
8.1 Compound interest... 2... 2... ee ee ee 61 
8.2 Nominal rate’: pi. ra sha bee Pe br BA ee ie ae 64 
8.3 Discount and annuities . . . 2... 2... ee ee 64 
8.4 Accumulated value... 1... ee ee 66 
8.5 Effective and nominal discount rates ................ 66 


FIXCTCISCS®, ca. chs ahead De aeut ea Baud boa hl ke tha Bakes By ea gh oh Ei tithe ile 67 


Contents 


Comparison of Random Variables. Preferences of Individuals 


A General Framework and First Criteria ................00.. 


Chapter 1. 
1 
1.1 
1.2 
1.3 


Préférence:order saton oh ae ee A ee a as 
Several simple criteria . . . 2... ...........002.000. 


1.2.1 The mean-value criterion ...............0.. 
1.2.2 Value-at-Risk (VaR)... 0... ee es 
1.2.3 An important remark: risk measures rather than criteria . 
1.2.4 Tail-Value-at-Risk (TailVaR) .............0.. 
1.2.5 The mean-variance criterion. ............... 


On coherent measures of risk. 2... ....0.....0.0.0.0.0.00084 


Comparison of R.V.’s and Limit Theorems .................., 


2.1 A simple model of insurance with many clients ........... 
2.2 St. Petersburg’s paradox... 2... . ee ee 
Expected Utility o saae Gohan 8 hee ee eB UAE Se ee oes 
3.1 Expected utility maximization . .. . ooa 
3.1.1 Utility function .. aaa aaa a 
3.1.2 Expected utility maximization criterion ......... 
3.1.3 Some “classical” examples of utility functions ...... 
3.2 Utility and insurance . . . 2... 2... ee ee ee 
3.3 How to determine the utility function in particular cases ...... 
3.4 Risk aversión a ca oea e i e a i e ee 
3.4.1 A definition .. aaa a 
3.4.2 Jensen’s inequality . . . o. o oaaae 
3.4.3 How to measure risk aversion in the EUM case. ..... 
3.4.4 POOS e408: ors het A E AS a E AE RA or eng AT 
3.5 A new perspective: EUM as a linear criterion ............ 
3.5.1 Preferences on distributions ................ 
3.5.2 The first stochastic dominance .............. 
3.5.3 The second stochastic dominance ............. 


4.1 
4.2 
4.3 


4.4 


3.5.4 The EUM criterion ..........0.0.....0.008.4 


3.5.5 Linearity of the utility functional .. 2... 2.2... 
3.5.6 An axiomatic approach ................4. 
Non-Linear Criteria. 2... a 
Allais’ paradox . 2... ea tanna ea ee 
Weighted utility. ©... ee ee 
Implicit or comparative utility .................00.. 
4.3.1 Definitions and examples ................. 
4.3.2 In what sense the implicit utility criterion is linear . . . . 
Rank Dependent Expected Utility ..............200.. 
4.4.1 Definitions and examples ................. 
4.4.2 Application to insurance... ooo a 
4.4.3 Further discussion and the main axiom .......... 


Optimal Payment from the Standpoint of an Insured . . . . oaaae aa’ 


5.1 
5.2 


Arrow’s theorem: 2.3.3.) ao oe a Be eo eee ae ded 
AvgeneralizauiOn cat a Bie alae Be eM ees Balin d Mad a 


RENTER 


Bel 


Bee 


a g 


xii 


5.3 


Contents 


Historical remarks regarding the whole chapter ........... 


6 BEX CTCISCS ae uiy Sears te ah ran back A areal a ak Caio wy Ae he a Bil Yar facie be 


Chapter2. An Individual Risk Model for a Short Period 
1 The Distribution of an Individual Payment ................2., 
1.1 The distribution of the loss given that it has occurred ........ 
1.1.1 Characterization of tails... ............00.. 
1.1.2 Some particular light-tailed distributions ......... 
1.1.3 Some particular heavy-tailed distributions ........ 
1.1.4 The asymptotic behavior of tails and moments ...... 
1.2 The distribution of the loss... 2... 2. ee ee 
1.3 The distribution of the payment and types of insurance ....... 
2 The Aggregate Payment... ..............2. 0.0.00 00004 
2.1 Convolutions aos eors 4858 a a 8 Pa Eee a Re eS 
2.1.1 Definition andexamples.................. 
2.1.2 Some classical examples ................. 
2.1.3 An additional remark regarding convolutions: 
Stable distributions .................0.. 
2.1.4 The analogue of the binomial formula for convolutions 
2.2 Moment generating functions .................0.. 
3 Premiums and Solvency. Approximations for Aggregate Claim Distribu- 
TONS si Seis Bt a ohh Bo a a A dad oe kad fl a 
3.1 Premiums and normal approximation. A heuristic approach 
3.1.1 Normal approximation and security loading ....... 
3.1.2 An important remark: the standard deviation principle . . 
3.2 A rigorous estimation... . o ooa ee ee ee 
3.3 The number of contracts needed to maintain a given security level . 
3.4 Approximations taking into account the asymmetry of S ...... 
3.4.1 The skewness coefficient ................. 
3.4.2 The I-approximation ................... 
3.4.3 Asymptotic expansions and Normal Power approxima- 
TOM sei Gy IG NS E Cee i at hel 


4 Some General Premium Principles ..................000,4 
5 EXEL CISES n hse ashy a vec ge SO oe ae Ee Ee SE eS ch ee a ote 


Chapter 3. 


A Collective Risk Model for a Short Period 


1 Three Basic Propositions ............. 00000000022 eae 
2 Counting or Frequency Distributions ..................-4, 


2.1 


2.2 


The Poisson distribution and theorem ................ 
2.1.1 A heuristic approximation... .............. 
2.1.2 The accuracy of the Poisson approximation ....... 
Some other “counting” distributions ................. 
2.2.1 The mixed Poisson distribution .............. 
2.2.2 Compound mixing. .................000. 
23 The (a,b,0) and (a,b, 1) (or Katz-Panjer’s) classes... . 


Contents 

3 The Distribution of the Aggregate Claim .................., 
3.1 The case of a homogeneous group ..................- 

3.1.1 The convolution method ................. 

3.1.2 The case where N has a Poisson distribution ....... 

3.1.3 The mg.f.method..................04. 

3.2 The case of several homogeneous groups .............. 

3.2.1 The probability of coming from a particular group . . . . 

3.2.2 A general scheme and reduction to one group ...... 

4 Premiums and Solvency. Normal Approximation .............., 
4.1 Limit theorems ... 2... 2... 0... 0020000000202 o 

4.1.1 The Poissoncase ...........2.2-000 00 Ga 

4.1.2 The general'cases s as aioin doi yn era ks A 

4.2 Estimation of premiums ................2..-..00.- 

4.3 The accuracy of normal approximation ............... 

4.4 Proof of Theorem 12 ............... 0202020000. 

5 Exercises: “3.4023. ii g eee hc feo oie Geol edits, tattle a As ae, Mean oe ah hay de Ti 

Chapter4. Random Processes and their Applications I 

1 A General Framework and Typical Situations ................, 
1.1 Preliminaries: i... 4o a wie Sede ee Bek eS ee eed eea 

1.2 Processes with independent increments ............... 

1.2.1 The simplest counting process. .............. 

1.2.2 Brownian motion ..............2.0-00004 

1.3 Markov processes ooie aaa aa a ee 

2 Poisson and Other Counting Processes . . . . ooo aa 
2.1 The homogeneous Poisson process... . ooa 

2.2 The non-homogeneous Poisson process ..............-- 

2.2.1 A model and examples .................. 

22:2 Another perspective: Infinitesimal approach ....... 

2.2.3 Proof of Proposition] ................0.. 

2.3 The Cox process: cea u he ai AO ot edn BAe Ee ek acd 

3 Compound Processes .......... 0.0000 pee eee 
4 Markov Chains. Cash Flows in the Markov Environment .......... 


4.1 
4.2 


4.3 


4.4 


Preliminaries ..........0.0 0. eee ee ee ee ee ee 


4.2.1 Variables defined on states... 2.2... .....002. 
4.2.2 Mean discounted payments ................ 


4.2.3 The case of absorbing states... ooa aaa 
4.2.4 Variables defined on transitions .............. 
4.2.5 What to do if the chain is not homogeneous ....... 
The first step analysis. An infinite horizon . ............. 
4.3.1 Mean discounted payments in the case of infinite time 
hori Zois erarus eee a a ana de een AAAG 


4.3.2 The first step approach to random walk problems... . . 
Limiting probabilities and stationary distributions. ......... 


BES 


g3 


wel 


xiv Contents 

4.5 The ergodicity property and classification of states ......... 289 
4.5.1 Classes: of states; Iun Seat Skeid beds oe 289 
4.5.2 The recurrence property. ................. 290 
4.5.3 Recurrence and traveltimes................ 293 
4.5.4 Recurrence andergodicity................. 294 
5 OAA Soy kn tune Mts Ae A be tial, BR ce ea oe lla og 296 
Chapter5. Random Processes and their Applications IT 303 
1 Brownian Motion and Its Generalizations .................., 303 
1.1 More on properties of the standard Brownian motion ........ 303 
1.1.1 Non-differentiability of trajectories ............ 303 

1.1.2 Brownian motion as an approximation. The Donsker- 
Prokhorov invariance principle .............. 304 

1.1.3 The distribution of w;, hitting times, and the maximum 
value of Brownian motion... .............. 305 
1.2 The Brownian motion with drift ................00.. 308 

1.2.1 Modeling of the surplus process. What a Brownian mo- 
tion with drift approximates in thiscase.......... 308 
1.2.2 A reduction to the standard Brownian motion. ...... 310 
1.3 Geometric Brownian motion . . . o.oo 311 
2 Matingales nucis aaia a a i aa be Sb: Maa oh i 312 
2.1 Two formulas of a general nature ... ooa 312 
2.2 Martingales: General properties and examples............ 313 
2.3 Martingale transform... 2... 2... .... 0000000004 319 
2.4 Optional stopping time and some applications. ........... 320 
2.4.1 Definitions and examples ................. 320 
2.4.2 Wald is identity’: 04.5040 ho he alee a ea 323 
2.4.3 The ruin probability for the simple random walk ..... 325 
2.4.4 The ruin probability for the Brownian motion with drift . 326 

2.4.5 The distribution of the ruin time in the case of Brownian 
MOUOD 3.252: a too whe ie a eke p RA ea Boag la es 328 
2.4.6 The hitting time for the Brownian motion with drift . . . 329 
25 Generalizations: 4.8 dar ee oh eh bs Eh ee 330 

25:1 The martingale property in the case of random stopping 
TMC. sy a eee wee dy ee Eee 330 

2.5.2 A reduction to the standard Brownian motion in the case 
of random time.” a eona 4a ied eae es a 331 

25:3 The distribution of the ruin time in the case of Brownian 
motion: another approach ................. 332 
2.5.4 Proofof Theorem 12 ................... 333 
2.5.5 Verification of Condition 3 of Theorem6......... 334 


3 FAXELCISES.. nicer Ti C tab bk chyba wal oo BP alse gas Ba Ps ha edule dn Sad a. Sst 335 


Contents 


Chapter 6. Global Characteristics of the Surplus Process 
1 A General Framework ........0..0.. 000002 eee eee ees 
2 Ruin: Models: 3.325208. 4.0% weg a bys wid peek bere aati 
2.1 Adjustment coefficients and ruin probabilities... ......... 
2.1.1 Lundberg’s inequality ................0.0. 
21:2 Proof of Lundberg’s inequality .............. 
2.1.3 The main theorem ..................04. 
2.2 Computing adjustment coefficients... ............... 
2.2.1 A general proposition ................000. 
2.2.2 The discrete time case: Examples ............ 
2.2.3 The discrete time case: The adjustment coefficient for a 
group of insured units... ..............02. 
2.2.4 The case of a homogeneous compound Poisson process . 
2.2.5 The discrete time case revisited .............. 
2.2.6 The case of non-homogeneous compound Poisson pro- 
CROSSES) i zach Sods eed et, ws Sea oe en eG es & 
2.3 Finding an initial surplus . . . aaa 
2.4 Trade-off between the premium and initial surplus ......... 
2.5 Three cases where the ruin probability may be computed 
PICCISCL Ys tona y e a n he a ets a OAAS 
2.5.1 The case with an exponentially distributed claim size 
2.5.2 The case of the simple random walk ........... 
2.5.3 The case of Brownian motion ............... 
2.6 The martingale approach and a generalization of Theorem 2 .... 
2.7 The renewal approach . .. ooa 20-00 00.4 
2.7.1 The first surplus below the initial level .......... 
2.7.2 The renewal approximation ................ 
2.7.3 The Cramér-Lundberg approximation. .......... 
2.7.4 Proof of Theorem 5 from Section2.7.1 .......... 
2.8 Some recurrent relations and computational aspects. ........ 
3 Criteria Connected with Paying Dividends ................., 
3.1 Aceeneral model: ris oas ee dk ae HO ed Ae OM dee Me 
3.2 The case of the simple random walk . . .. aoaaa aaa 
3.3 Finding an optimal strategy ..................004. 
4 EXerCiSeS) tis fg a ena so Nhe e ott ee n Sos eh oe ed OS 
Chapter7. Survival Distributions 
1 The Probability Distribution of Lifetime .................., 
1.1 Survival functions and force of mortality .............. 
1.2 The time-until-death for a person of agivenage........... 
1.3 Curtate-future-lifetime .....................000. 
1.4 Survivorship groups ........... 000000. eee eee 
1.5 Life tables and interpolation ..................02. 


1.5.1 Lite tables 38 aay bk tow a ao ee ene aety es 
1.5.2 Interpolation for fractional ages... ........0.. 


Bel 


xvi 


4 


1 


Contents 


1.6 Analytical laws of mortality . . . ooa 
A Multiple Decrement Model... aoaaa a 
2.1 Asingledite: s a ae aod eae Gk hs E E a A 
22 Another view: net probabilities of decrement ............ 
2.3 Survivorship group . . 2... 2... ee 
2.4 Proof of Proposition] ................020 20000. 
Multiple Life Models .. 2... .......... 00.2... 0. 000004 
3.1 The joint distribution... 2... 2.0.0.2... ..2.2000. 


3:2 The lifetime of statuses... 2... ee ee 
3:3 A model of dependency: conditional independence ......... 
3.3.1 A definition and the first example ............. 
3.3.2 The common shock model... .............. 
EXEICiSeS). & tava Sida hate eee ee Pe ae et a Be 
Chapter 8. Life Insurance Models 
A General Model ...........0. 0... 0000020 eee ees 
1.1 The present value of afuturepayment................ 
1.2 The present value of payments for a portfolio of many policies . . . 
Some Particular Types of Contracts... .............02.000,4 
2.1 Whole life insurance... 2... 2... ee ee ee 
2.1.1 The continuous time case (benefits payable at the mo- 


ment of death) ca eena a de ee ee 
2.1.2 The discrete time case (benefits payable at the end of the 

year‘of death): vi. > aiea Gah aie Ge 
2.1.3 A relation between A, andA, ............04. 
2.1.4 The case of benefits payable at the end of the m-thly pe- 


Od 2s ensue Be oh Cee whl ern aa Bein Meech aa che oa dea hh Nalan 

2:2 Deferred whole life insurance . . . . o.oo aoa a a 
2.2.1 The continuous time case . . . o.oo a aa 

2.2.2 The discrete time case . . . o.oo ooo a a a 

2.3 Term INSUranGe ian s,s ana gad ee he Re a 
2.3.1 Continuous time ............00. 0000004 ae 

2.3.2 Discrete mea 62... See ha ee Se BO te Be aini 

2.4 Endowments ...... a e E E ana ay Ae A E arna AS aA 
2.4.1 Pure endowment . . . o aoaaa 

2.4.2 Endowment <5 ooa a4 hee eis E a aar ce N A 
Varying: Benefits oh...) a ae ag le ad Ge ai a ty hg ody A 
3.1 Certain payments... .. 2... 2.2.0.0... 0000000084 


3.2 Random payments ............ 0... 0000000 ee 
Multiple Decrement and Multiple Life Models... ...........2., 
4.1 Multiple decrements ................ 2.000200 004 
4.2 Multiple life insurance .. 2... 2... 0.2... .020.000. 
On the Actuarial Notation... 2... 2.2... 2 ee ee ee 
ISXELCISES., Mace f fa sspe dad Alena ake G eae Whaat al a en Saeed tins & 


Contents xvii 


Chapter9. Annuity Models 473 

1 Two Approaches to the Evaluation of Annuities ..............., 473 

1.1 Continuous annuities . . . 2... 2... ee ee ee 473 

1.2 Discrete annuities... 2... ee ee 475 

2 Level Annuities. A Connection with Insurance... ............., 478 

2.1 Certain annuities’. yb. 4.2 ot se ae a Se A 478 

2.2 Random annuities ............ 0.00000 000000 479 

3 Some Particular Types of Level Annuities ................., 480 

3.1 Whole life annuities .................... 0000. 480 

3.2 Temporary annuities > non 2... .. 2.0.0... 02002000004 483 

3.3 Deferred annuities .. 2... 2... 0.0.02... 002000004 486 

3.4 Certain and life annuities... 2... aa 489 

4 More on Varying Payments ...............2.-02.. 0000004 491 

5 Annuities with m-thly Payments ....................-4, 493 

6 Multiple Decrement and Multiple Life Models... ...........2.., 495 

6.1 Multiple decrement... ...............0 2020000. 495 

6.2 Multiple life annuities .....................000. 498 

7 EXETCISES ad eb Rhydian ol Wl apt aA Seek a easy cede 2 ded had © 500 
Chapter 10. Premiums and Reserves 505 | 

1 Premium Annuities sro dod ue eda el aoe Wh he a 505 

1.1 General principles ................ 0.202000 00. 505 

1.2 Benefit premiums: The case of a single risk ............. 507 

1.2.1 Nettate:: 3.0 tom Wi aes ae Bet ah ae E ee ca 507 

1.2.2 The case where “Y is consistent with Z? ......... 511 

1.2.3 Variantes ove. ao Se ah Ge ie So GRE AL ke BS 512 

1.2.4 Premiums paidmtimesayear............... 514 

1.2.5 Combinations of insurances .............0.. 515 

1.3 Accumulated values .. 2... 2.2.20... a 516 

1.4 Percentile premiums ............... 0.200000 0004 517 

1.4.1 The case ofasinglerisk...............00.. 517 

1.4.2 The case of many risks. Normal approximation ..... 519 

1.5 Exponential premiums . . . oaa 522 

2 Réserve Sina Ah ya aR eee gehts S A aise A de ane Ange A 523 

2.1 Definitions and preliminary remarks ................. 523 

2.2 Examples of direct calculations ................... 524 

2.3 Formulas for some standard types of insurance ........... 526 

2.4 Recursive relations .. 2... 2... 2. ee ee 527 

3 EXGTCISES fat a Toan E Gen ep ae Maleate eee 2 BO a va oh 530 

Chapter 11. Pensions Plans 533 

1 Valuation of Individual Pension Plans .................... 533 

1.1 DB: plans: ioa ine ine ee ES ae ie RAE Wis te BS 534 

1.1.1 The APV of future benefits ................ 534 


1.1.2 More examples of the benefit rate function B(x,h,y) . . . 536 


a 868 


el 


wel 


xviii Contents 

1.2 IDG plans. aaran eee ee eee A OO oe aa 539 
1.2.1 Calculations at the time of retirement. .......... 539 
1.2.2 Calculations at the time of entering a plan ........ 540 
2 Pension Funding. Cost Methods ...................0-4,4 541 
2.1 A dynamic DB fund model. ..................00.. 541 
2.1.1 Modeling enrollment flow and pension payments... . . 541 
2.1.2 Normal: Cost as a got) aw AR ee Ge Ee ee eae Ba 542 
2.1.3 The benefit payment rate and the APV of future benefits . 544 
2.2 More on cost methods ...............2.02.0-0000.4 546 
22:1 The unit-credit method . .. aoaaa a 546 
2:22 The entry-age-normal method . . . aoaaa aa aaa 549 
3 ERGTCISES 25,148 at asthe Be Ie De ae Sek atte o TE Ae o ae a, ay thee E 550 
Chapter 12. Risk Exchange: Reinsurance and Coinsurance 553 
1 Reinsurance from the Standpoint ofa Cedent ................, 553 
1.1 Some optimization considerations .................4. 553 
1.1.1 Expected utility maximization. .............. 554 
1.1.2 Variance as a measure of risk ............... 556 

1.2 Proportional reinsurance: Adding a new contract to an existing 
portfolio; pean sneon a whe TR Re ale ek Be Pon eva Sete as 558 
1.2.1 The case of a fixed security loading coefficient. ..... 558 
1.2.2 The case of the standard deviation premium principle . . 561 
1.3 Long-term insurance: Ruin probability as a criterion ........ 563 
1.3.1 An example with proportional reinsurance ........ 563 
1.3.2 An example with excess-of-loss insurance ........ 565 
2 Risk Exchange and Reciprocity of Companies. ..............., 566 
2.1 A general framework and some examples .............. 566 
2:2 Two more examples with expected utility maximization ...... 574 
2.3 The case of the mean-variance criterion ............... 578 
2.3.1 Minimization of variances... ............0.. 578 
2.3.2 The exchange of portfolios ................ 581 
3 Reinsurance Market... ........ 0.0.2... 000000002 G 586 
3.1 A model of the exchange market of random assets ......... 586 
3.2 An example concerning reinsurance ..............-.. 589 
4 EXETCISES itn ie I aa anode, a) Wace Po dP Etat ahd Hd glen beai de 592 
Appendix 595 
1 Summary Tables for Basic Distributions... ................ 595 
2 Tables for the Standard Normal Distribution ...............2., 597 
3 Illustrative Life Table... 2... ....0...2.. a 598 
4 Some Facts from Calculus ... 1... 2.2... ... 2.020.000 00004 601 
4.1 The “little o and big O” notation... 2.2.2.2... ......00.. 601 
4.1.1 Littler fuss aaa aoe BGS a Li Bey MAREN Bde foe a 601 
4.1.2 BISO fos deeds he apie AU eet Andy de yy Baars 602 


4.2 Taylorexpansions ........... 0.0.0.0. ee ee eee 603 


Contents 
4.2.1 A general expansion... ..........2.020-000- 
4.2.2 Some particular expansions ................ 
4.3 Conca yss pared Wee BeOS R SS hare. ese Ge Sop As 
References 


Answers to Exercises 


This page intentionally left blank 


Introduction 


We begin with what may be viewed as a small miracle. Two investors, Ann and David, 
expect random incomes amounting to random variables (r.v.) Xı and X2, respectively. We 
do not exclude the case where the X’s may take on negative values, which corresponds 
to losses. For simplicity, suppose X; and X are independent with the same probability 
distribution. Then X; and X7 have the same expected value m = E{X;} and variance o = 
Var{X;}. 

Assume that Ann and David evaluate the riskiness of their investments by the variance 
of income, and being risk averse, they want to reduce the riskiness of their future incomes. 
To this end, Ann and David decide to divide the total income into equal shares, so each will 
have the random income 


1 
y= z% +X2). 


Then for both, Ann and David, the expected value of the new income will be 
1 
E{Y} = 5 (m-+m) =m, 


that is, the same as before sharing the risk. On the other hand (see Chapter 0 for more 
details), the variance 
Var{Y} = : (2+0) = A 
4 2 
is half as large. 

Although this result is easy to prove, this is a key fundamental fact. And it is indeed 
quite astonishing. The riskiness of the system as a whole did not change, the r.v.’s X; and 
Xz remained as they were, but the level of risk faced by each participant has decreased. 

Now, consider n participants of a mutual risk exchange, and denote their random incomes 
by X,,...,X,. Assume again that the X’s are independent and identically distributed, and 
set m = E{X;} and o? = Var{X;}. If the participants divide their total income into equal 
shares, then the income for each is 


B Xi +... + Xn 
E i 


Y 


In this case (see again details in Chapter 0), 


2 

: Oo 

E{Y} =m, while Var{Y} = —, 

n 

and for large n, the variance is close to zero. Thus, for a large number of participants, the 
risk of each separate participant may be reduced nearly to zero. 

The phenomenon we observed in its simplest form is called redistribution of risk. It 

is at the heart of most stabilization financial mechanisms, certainly including insurance. 


2 Introduction 


People use insurance because they can redistribute the risk, making it small for each if the 
number of participants is large. Insurance companies play the role of organizers of such a 
redistribution. Of course, they do it for profit, although there are non-profit organizations 
of mutual insurance. 

With some exaggeration, one may say that the theory we study in this book deals with 
various generalizations of the scheme above. To have a general picture, let us consider a 
brief outline of the book. 


Chapter 0 contains basic facts from Probability Theory and the Theory of Interest we use 
in the book. 

In Chapter 1, we will see that variance is not the only possible characteristic of riskiness, 
and as a matter of fact, is far from being the best. There are more sophisticated and flexible 
risk measures, and in Chapter 1 we study several of the most important ones. 

In Chapter 2 we built the first relatively complete model of insurance. First, we consider 
just one client or insured. The object of study here is the random future payment X of the 
company to the client. 

Once we know how the random variable (r.v.) X appears, we consider a group of n clients, 
or a risk portfolio. In this case, we study the r.v. of the total payment 


Sa =Xit...+Xn, (1) 


where X; is the payment to the ith client. For small n, we can compute the distribution of 
Sn directly by certain methods studied in Chapter 2. For large n, we apply approximation 
methods. 

To cover clients’ risks, the company collects premiums. Denote by 7,, the total premium 
corresponding to the risk portfolio above. It is natural to expect—and we prove it—that 
for the company to function with financial stability, the premium m, needs to be larger 
than the mean total payment E{S,,} which is usually called a net premium. The difference 
An = Tn — E {Sn}, called a security loading, is the additional payment for the risk the insurer 
incurs. 

The determination of a value of A, acceptable for the company is one of the main, if not 
the main, tasks of Actuarial Modeling, and in the course of the book, we consider several 
basic results on this point. 


In Chapter 3, we generalize the previous model and instead of (1), consider the sum 
Sy =X, +... + XN, (2) 


where not only the X’s but the number of terms in the sum, N, is also random. 

The most important for insurance interpretation of this scheme concerns the situation 
where we deal with a portfolio as a whole, and we are interested in the total claim the 
company will have to pay out. Here, N is the number of future claims to be received by the 
company during a certain period, and the X’s are the sizes of payments corresponding to 
these claims. 

In Chapter 3, we explore possible probability distribution of the r.v. N and the aggregate 
payment Sy itself. 


Introduction 3 


Both models above are static. Chapters 4-6 concern dynamic models. The main object 
of study here is a surplus process R,, where t stands for time and the process itself may be 
defined, for example, as follows. 

Let N, be the number of claims received by time t. We consider N; at all moments t from 
some time interval, and hence we view N, as a random process. 

By analogy with (2), the total aggregate claim paid by the company by time t is the r.v. 


Sn, = X1 +...4+Xn,, 


where X; is the size of the ith claim. The process Sy, is called a claim process. In Chapters 
4-5, we study various types of the processes N, and Sy,- 

Along with the flow of claims, we consider the cash flow of premiums the company is 
receiving. Let c; be the total premium collected by time t. Suppose also that at time t = 0, 
the company has an initial surplus u. Then the surplus of the company at time f is the r.v. 


R, =u +c, — SN,- (3) 


In Chapter 6, we consider global characteristics of the process R;. In one way or another, 
they reflect the “quality” of the process R, or, in other words, the extent the future surplus 
process will meet the goals of the company. These characteristics are relevant to either 
the profitability of insurance operations or to their viability, i.e., the degree of protection 
against adversity. 


Chapters 7-10 address life insurance and annuities. (The last term means a series of 
regular payments; for example, pension payments.) Risk redistribution continues to play 
an important role in this case, but this type of insurance product has an additional special 
feature: an essential time lag between the moment when the client pays a premium and the 
time when the company pays the corresponding benefit. 

The company somehow invests the premiums paid, and during the lifetime period men- 
tioned the amount invested is growing at some rate. Therefore, the total amount of the 
premium sufficient for the company to fulfill its obligation may be (and should be) less 
than the size of the benefit. To determine how much less is the main task of the actuary 
in this case. The models are rather sophisticated and cover various types of insurance and 
annuity contracts. 


In Chapter 11, we explore particular but important models of pension plans. 


In Chapter 12, we return to what we started with, that is, to risk redistribution. However, 
we consider it at another level, namely, at the level of reinsurance when companies redis- 
tribute the risk incurred between themselves. Such a risk redistribution may be even more 
flexible than that at the first level—the companies may share individual risks or redistribute 
the total accumulated risk in different ways. 


W 


4 Introduction 
x ok ok 


As mentioned in the preface, the material is presented in the form of three nested “routes”. 
Route | consists of the material designed for a one semester course. Route 2 is intended for 
a broader and deeper study, perhaps, for a two-semester course. Route 3 (more precisely, 
the part that is not included in Routes 1-2) is designed primarily for graduate study. Route 
1 is completely contained in Route 2, and both of them are included in Route 3. 


All routes are self-contained. 


The special “road signs” will help the reader to continue in the chosen route. For exam- 


ple, the sign 
Route 1 => page 111 


indicates that the readers who chose Route 1 should advance to p.111 to continue the route. 
Below this sign, a small “flag” in the margin designates which route runs now (as in the 
margin here). In other words, if the reader does not switch to the page mentioned (as p.111 
above), she/he will have entered Route 2 (and hence, Route 3 too). 

If the reader goes to the page mentioned (as p.111 above), then in the margin of this 
page, she/he will see a sign confirming that this is the right place to move to, and showing 
a particular location on the page where Route 1 picks up (as in the margin here). 

To see a general picture of the book’s structure, the corresponding flag-signs are also 
placed in the Table of Contents. (The rare cases where we switch routes even inside a 
subsubsection are not reflected in Contents.) 

Exercise sections are excluded from this routing system. If the main text of a route 
continues in another chapter, the road sign directs the reader to this chapter though the 
exercise section of the current chapter may contain exercises from the current route. We 
hope that if the reader wants to do exercises, she/he will visit the exercise section anyway. 

In the exercises, the problems belonging to Route 2 are marked by an asterisk *, problems 
from Route 3 by **. However, if a whole section belongs to Route 2 or 3, in the exercises, 
we mark by * or ** only the title of this section. In the exercises for chapters belonging 
entirely to Route 2 or 3, we naturally do not mark anything. 

Occasionally, purely technical proofs or additional remarks are enclosed by the signs 
p> <. This material may be omitted in the first reading. 

If we have not used a definition or fact recently or are using it for the first time, then the 
corresponding references are given. This is being done just in case. If the reader is already 
familiar with a referred item, it makes sense to ignore the reference and move ahead. 


Certainly, when moving along Route 1, the reader is welcome to look around or venture 
into areas that are not included in the first route. However, the reader should not be dis- 
couraged if something seems difficult. Route 2 is indeed slightly more involved than Route 
1, but only slightly, and requires just a more in-depth reading. 

Another matter is that if you are taking a one-semester course, then it may be reasonable 
to postpone the material of Route 2—at least, most of this material—for a while, and return 
to it when you have more time and experience. Enticing topics you skipped on the way will 
await you. 


Introduction 5 


More technical remarks. The symbol W indicates the end of a proof, while the symbol 
marks the end of an example or a series of examples. 

The numbering of sections and formulas in each chapter is self-contained. The adopted 

system of references is clear from the following examples. 

Section 2.3 is the third subsection of the second section of the current chapter. 

The formula (2.3.4) is the fourth formula of the third subsection of the second section of 
the current chapter. 

Example 2.3-4 is the fourth example from Section 2.3 of the current chapter. 

In each chapter, theorems, propositions, and corollaries are being enumerated in a linear 
fashion through the whole chapter: the theorem that appears after Proposition 2 is Theorem 
3, and the corollary following Theorem 3 is Corollary 4. 

If we refer to a formula, section, example, etc., from another chapter, we write the number 
of the chapter to which we are referring in bold font. For instance, Section 1.2.3 is the third 
subsection of the second section of the first chapter. Formula (1.2.3.4) is the formula (2.3.4) 
of the first chapter. Theorem 1.2 is Theorem 2 of Chapter 1, etc. 


The following abbreviations are used throughout the entire book. 


APV—actuarial present value; 

CLT—central limit theorem; 
c.d.f.—cumulative distribution function; 
c.v.—coefficient of variation; 
d.f.—distribution function (we omit here the adjective “cumulative’’); 
EU—expected utility; 

EUM—expected utility maximization; 
FSD—first stochastic dominance 

iff—if and only if; 

i.i.d—aindependent and identically distributed; 
LLN—law of large numbers; 

l.e.r.—loss elimination ratio; 

].-h.s.—left-hand side; 

m.g.f.—moment generating function; 

p.d.f.— probability density function; 
RDEU—trank dependent expected utility; 
r.v.random variable; 

r.vec.—random vector; 

r.-h.s—tright-hand side. 

SSD—second stochastic dominance 


This page intentionally left blank 


Chapter 0 


Preliminary Facts from Probability and 
Interest 


This chapter’s primary purpose is for further reference. Nevertheless, it is recommended 
that the reader at least skims through the chapter before starting to read the main text. 


We deal here mainly with definitions and basic notions to which we will repeatedly refer 
throughout the book. Most facts are given without proof, but we discuss their significance 
and plausibility. We touch briefly on simple or standard notions, and pay more attention 
either to notions that are less traditional but necessary or that are more difficult; for exam- 
ple, moment generating functions, and conditional expectations. For the last topic, we even 
give exercises at the end of the chapter. 

Sections 1-7 concern Probability Theory. In Section 8, we consider elements of the 
theory of interest. 


1 PROBABILITY AND RANDOM VARIABLES 
1.1 Sample space, events, probability measure 


When building a model of any experiment, we first specify the space of all possible 
outcomes which may occur as a result of the experiment. We call such a space a sample 
space, or a space of elementary outcomes. A traditional notation for a sample space is Q, 
and for its elements, the individual outcomes, is œ. So, Q = {0}. 

We denote the standard set operations with sets A, B,... of elements from Q: complement 
A‘, union A UB, and intersection A N B of sets A,B. For AM B we use a shorter notation AB, 
and call it the product of A and B. The reader is invited to verify on her/his own the set 
identities 

(ANB) = AUB‘, (AUB) =A°NB*. (1.1.1) 


(A proof may be found in practically any probability textbook, e.g., [102], [116], [122].) 
Next, we specify a collection, or a class, 4 of sets A which we will consider. Sets from 
A are called events. 
For the theory to be complete and non-contradictory, the class 4 should be sufficiently 
rich; more precisely, we assume the following properties to be true: 


(a) if A € A, then the complement A°® also belongs to A; 


(b) if events A;,A2,... are from A, then their union U; A; also belongs to A. (RDA) 


8 0. PRELIMINARY FACTS 
From (1.1.2) it follows that 


(c) an empty set @ and the whole space Q belong to 4 ; 
(d) for any events A;,A2,... from A, their intersection 9;A; € A. 


(The last property follows from properties (a), (b) and (1.1.1). To prove (c), we take any 
set A € A, and write 0 = ANAS € A by property (d), and Q =AUA‘ € A by property (b).) 


We call a space © discrete, if it consists of a finite or countable number of points: Q = 
{@),@2,...}. Otherwise, we call the space non-discrete or uncountable. 
If a space Q is discrete, we can consider as A the class of all subsets of Q. 


> If Q is uncountable, for example, if it may be identified with the real line, we cannot consider all 
subsets of Q. The reason is that in general it is impossible to define probabilities simultaneously for 
all sets. So, some sets should be excluded from consideration. Fortunately, it suffices to exclude only 
some very exotic sets, which by no means will prevent us from building models of real phenomena. 
(For detail, see any advanced textbook on Measure Theory or Probability, e.g., [27], [70], [120], 
[129].) « 


Throughout the book we assume that the class A of events under consideration is such 
that we are able to define probabilities of all events as we do in the following definition. 

We call a probability distribution, or a probability measure, a function P(A) of sets A 
from A such that 


(i) O< P(A) <1 foral A € A; 
y=; (1.1.3) 
(iii) P(U;A;) =¥;P(Ai) for any disjoint events A,,Az,... € A. 


The value of P(A) is called the probability of event A. 

In particular, from definition (1.1.3) it follows that P(0) = 0. Indeed, 1 = P(Q) = P(OU 
Q) = P(0) + P(Q) = P(O) +1. So, 1 = P(0) +1, which implies that P(0) = 0. 

Note also that in Property (i) we might require just P(A) > 0. Indeed, by Properties (iii) 
and (ii), P(A) < P(A) + P(A‘) = P(Q) = 1. We presented Property (i) in the above form 
for the completeness of the picture. 


Next, we state the following two elementary properties of P(A). 


¢ For any events A1,Ao,... € A, not necessarily disjoint, 
P(U/Ai) < È P(A). (1.1.4) 
i 


e For any A,BE A 
P(AUB) = P(A) + P(B) — P(AB). 


1.2 Independence and conditional probabilities 


Events A, and A; are said to be independent if 


P(A,Az) = P(A,)P(A2). (1.2.1) 


1. Probability and Random Variables 9 


We say that events A,,A2,...,A, are mutually independent if for any sample of integers 
(i1,i2, 5 2245 ik) from (1, n) 


P(Ai, vA) = P(Aj,)---P(Ai,). 


For example, events A;,A2,A3 are mutually independent if they are pairwise independent, 
that is, 


P(AjAz) = P(A1)P(Az), P(A1A3) = P(A1)P(A3), P(A2A3) = P(Az)P(A3), 


and 
P(A1A2A3) = P(A1)P(42)P (43). 


The conditional probability of event A given event B is 
P(A|B) = ——— 
(A\B) => 


provided P(B) #0. From (1.2.1) it follows that 


If A and B are independent, then P(A |B) = P(A). 


The following formula, in spite of its simplicity, proves to be a very useful tool in solving 
a great many probability problems. 


The law of total probability, or the formula for total probability. Consider a col- 
lection of disjoint events H; such that U;H; = Q. Such a collection is called a partition. 
Assume that P(H;) 4 0 for all i’s. Then for any event A, 


P(A) = ye P(A| Hi) P(A). (1.2.2) 


In the next formula, we interpret H;’s defined above as the events corresponding to dif- 
ferent hypotheses about the nature of an experiment, or the different possible causes of an 
observable event A. The event A itself is viewed as a particular result of the experiment. 
In this case, P(H;) is called the prior probability that the ith hypothesis is true (that is, 
before the experiment is carried out). The probability P(H;|A) is called posterior. It is the 
probability that the ith hypothesis is true given that A occurred as a result of the experiment. 


The Bayes formula (or rule). For any event A such that P(A) 4 0, and events H; defined 
above, for each i, 


_ P(A|Hi) _ P(A|H;) 
P(H;|A) = P ea (1.2.3) 


10 0. PRELIMINARY FACTS 


I(x) fx) 
f(x)dx 


FIGURE 1. 


1.3 Random variables, random vectors, and their distributions 
1.3.1 Random variables 


Below, we will use the symbol B for subsets of the real line. 

A random variable (r.v.) X = X (œ) is a function defined on the space Q. The function 
of sets Fx (B) = P(X € B) is said to be the probability distribution (briefly, distribution) of 
X. More precisely, Fy (B) is the probability of the set {0% : X(@) € B}. 

In other words, Fy(B) is the probability that the value of X will fall in the set B. If it 
cannot cause misunderstanding, we will omit the index X in Fy (B). 


> Speaking more rigorously, for the theory to be non-contradictory, we should consider only sets 
B for which we can define the notion of length. Such sets are called Borel sets. Non-Borel sets 
exist but are very exotic, so we can exclude them from consideration. (A rigorous definition and/or 
examples may be found in advanced textbooks on Measure Theory or Probability; see, e.g., [27], 
[70], [120], [124], [129].) 

Similar remarks may be made regarding the r.v. X itself. For the probability P(X € B) to be well 
defined, the set {@ : X(@) € B} should belong to the class A of events A for which the probability 
P(A) is defined. So formally, we define a r.v. as a function X (@) for which the set {0 : X(@) € B} € 
A for any (Borel) set B. (For more detail see, e.g., [27], [59], [70], [76], [120], [129].) In this book, 
we do not consider this issue. <4 


A r.v. X and its distribution F (B) are called discrete if X assumes a finite or countably 
infinite number of values x1,x2,.... We say that the distribution F (B) is concentrated at 
points x1,X2,.... 

A r.v. X and its distribution F (B) are called absolutely continuous, if there exists a non- 
negative function f(x) such that for any B 


F(B) = | fax, (1.3.1) 
B 
that is, P(X € B) is the area above the set B and under f(x); see Fig.la. The function f(x) 


is called the probability density function of the r.v. X. Briefly, we call f(x) the density of X. 
Setting B = (—%,%), we have 


‘i f(x)dx =1. (1.3.2) 


1. Probability and Random Variables 11 


Since a point is an interval of zero length, from (1.3.1) it follows that 


For any absolutely continuous r.v. X, and any number a, 
P(X =a) =0. 


(1.3.3) 


For an infinitesimally small interval |x, x + dx], the probability P(x < X < x+ dx) may 
be represented as f(x)dx; see Fig.1b. 

The two types of distributions described above do not exhaust all possible distributions. 
For example, we can consider a mixture of discrete and continuous distributions; see for 
more detail Section 1.3.5. In this book, we omit the adjective ‘absolute’ and refer to abso- 
lutely continuous distributions as continuous. 


> The point is that there exist distributions for which (1.3.3) is true (and hence these distributions 
are not discrete), but which cannot be represented in the form of (1.3.1), that is, they do not have 
densities. Such distributions are called continuous but non-absolutely continuous; see, e.g., [27], 
[120], [129]. These distributions are rather exotic, and we do not consider them here. < 


1.3.2 Random vectors 


Let X = (Xj,...,X;), where X;’s are r.v.’s. We call X a k-dimensional random vector 
(r.vec.). The function of sets F(B) = Fx(B) = P(X € B), where now B is a subset of the 
k-dimensional space R‘, is the distribution of X. 

Ar.vec. X = (X1,...,X,) and its distribution F (B) are said to be discrete if all coordinates 
X; are discrete. A r.vec. X and its distribution F (B) are said to be continuous if there exists 
a non-negative function f(x), where x = (x1,...,xx), such that for any B from R*, 


F(B) = f i f f(x)dx. (1.3.4) 
B 


The integral in (1.3.4) is a k-dimensional integral, and the differential dx = dx, --- dx . 


Setting B = R* we have 
f f f@ae=1. (1.3.5) 
RK 


R.v.’s X1, ..., Xk are said to be mutually independent if for any sets By, ..., Bg from the real 
line, the events {X; € B1}, ..., {Xk € By} are mutually independent. 


Consider now the case k = 2, and X = (X,,X2). Let rv.’s X;, take on values x;1,%j2,... 
(where i = 1,2), and fi; = P(X1 = x1;,X2 = x2;). We say that the probabilities f;; specify 
the joint distribution of (X,,X2), and call fj; joint probabilities. 

The probabilities s” = P(X; =x;) and f m = P(X, = x;) are called marginal; they char- 


acterize the distributions of the coordinates separately. The collections f (1) — ( f (1) 3 fP, go ) 


and f(2) = des f2, =) are called marginal distributions. The joint distribution com- 
pletely determines the marginal distributions: 


PET E 
7 i 


12 0. PRELIMINARY FACTS 


(a) (2) 


FIGURE 2. 


The converse assertion is not true in general. To specify the joint distribution, we should 
know the marginal distributions and the structure of dependency between X; and X2. In the 
case of independency the situation is simpler. 


Proposition 1 Discrete rv.’s X, Xz are independent if and only if 
baw: Sen 
fee Goran: 
EXAMPLE 1. Let a r.vec. X = (X;,X2) take on four vector-values corresponding to 


four points in Fig.2a with equal probabilities 1/4. In this case, it is convenient to set 
fij = P(X1 =i, X = j), where i = +1,j7 = +1. One may guess that in this case, X1,X2 


are independent. Indeed, fe P(X, =1) 1+: l, and similarly, all other marginal 


probabilities f°) = f{? = fQ =4. Hence, f= 1 = 4-4 = fO fO forali=+1,j=+1. 


Now let X take on values corresponding to the four points in Fig.2b with equal probabili- 
ties. In this case, X; and X take on values 0,+1,—1 with probabilities $, L, i respectively, 
and are certainly dependent. If, for instance, X; takes on the value 1, then X2 may be only 
zero, while if X; = 0, then the r.v. X2 may be either 1 or —1. 

Certainly, the fact that X;,X 2 are dependent follows from Proposition 1 also. For exam- 


ple, fir = P(X: = 1, X% = 1) =0, while f(O fO =1.1 40. 


In the case where k > 2, the results are similar, but the notation is a bit cumbersome. We 
define the joint probabilities fj,i,...;, = P(X1 =%1i,,...,.Xk =i, ), and the marginal probabil- 
ities f R = P(Xm = ny) Then 


Ea n he Fades 


where in g» summation is over all i1,...,im—1,im+1,---, ik, that is, ij, is fixed. 
The independence case is described by 


1. Probability and Random Variables 13 


(b) (c) 


FIGURE 3. 


Proposition 2 Discrete r.v: s X1,...,X; are mutually independent if and only if 


ooh eh onal ie 


Now let ar.vec. X = (X1 ,X2) be continuous, f(x) = f (x1,x2) be its density, and let fı (x1) 
and f2(x2) be the separate marginal densities of the r.v.’s Xı and X2, respectively. Then 


fi) = ECES fa(x2) = | Fanda. 


If k > 2 and the joint density is a function f(x) = f(x1,...,xx), then the marginal density 
of Xn is given by 


Foal in) = | ne Ree Sebago Be) Binding Ey 


where the integral is a (k — 1)-dimensional integral with respect to x1, ...,Xm—1,;Xm+15---Xk- 


Proposition 3 Continuous r.v.’s X,,...,X; are mutually independent if and only if 
Wikis eee = Te ee eke el Or any Aen (1.3.6) 


EXAMPLE 2. We use the same idea as in Example 1 for the case of continuous distribu- 
tion. Let ar.vec. X = (X,, Xz) take on values from the square in Fig.3a, and the joint density 
f (1,2) = 1/4 for all points x = (x1,x2) from this square, and f(x;,x2) = 0 otherwise. 

(The total integral of f over the square should be one; see (1.3.5). Hence, if we set f 
equal to a constant, this constant should be equal to one divided by the area of the square.) 

This type of distribution is called uniform, and we say that all points from the square are 
equiprobable. 

For |x1| < 1 and |x| < 1, 


1 1 1 
fier) =f fanda= f fare = 5, fale) =f flsi.mr)as =, 


14 0. PRELIMINARY FACTS 


and f(x1,x2) = i = 5 . 5 = fi (x1) f2(x2). Hence, Xı,X2 are independent, which could be 
predicted from the very beginning. 

Now, let X be uniform in the square depicted in Fig.3b. It is reasonable to guess that in 
this case X; and Xz are dependent, since the value of X; determines the range within which 
Xə can change. To show the dependence rigorously, we will find marginal densities and 
show that in this case (1.3.6) is not true. 

The square in Fig.3b consists of all points (x1,x2) for which |x;|+ |x2| < 1. The area 
of the square equals 2. Hence, f(x,,x2) = 1/2 for all points in the square mentioned, and 
f(x1,x2) = 0 otherwise. Consequently, for fixed xı, the density f(x1,x2) = 0 if |x2| > 
1 — |x|, and 


fix) = [ a Lio = 1- |x], if |x1| < 1, and = 0 otherwise. 
-1+ļxı| 2 
Similarly, f2 (x2) = 1 — |x2|, if |x2| < 1, and = 0 otherwise. The graph of fı (x1) is given 
in Fig.3c. The distribution with this density is called triangular; it will appear in this book 
repeatedly. 
Certainly, fi (x1) fo(x2) = (1 — (1 — |x|) 4 5 = f(x1,x2), and by Proposition 3, 
X, and X; are dependent. 


1.3.3 Cumulative distribution functions 


The cumulative distribution function, or simply the distribution function (d.f.) of a r.v. 
X, is the function Fx (x) = P(X < x). As usual, when it does not lead to confusion, we will 
omit the index X. 

Note that if F(B) = P(X € B), the distribution of X, then the d.f. F(x) = F((—°,x]). 
Thus, if we know the distribution F (B), we know the distribution function F(x). We will 
see later that the converse assertion is also true: the distribution function completely deter- 
mines F (B) for any set B. 

Any distribution function F (x) is non-decreasing, F (—œ) = 0, and F (ce) = 1. 

By definition, for any interval (a,b] and r.v. X with a d.f. F(x), 


P(a<X <b) = P(X <b) — P(X <a) =F(b)—F(a). 


From this it follows, in particular, that 


If F (x) is constant on an interval (a,b], then P(X € (a,b]) = 0. (1.3.7) 


See also Fig.4. 
Let us consider the limit of F (x) = P(X < x) as x converges to a number c from the left, 
that is, if x < c. Since for each x < c, the event {X < x} does not include the point c, 
eas 
„Jim PX <x)=P(X <c) 
Clearly, P(X < c) is not equal to P(X < c) if X assumes the value c with a positive proba- 
bility. Set 
F(c—0) =P(X <c) 


1. Probability and Random Variables 15 


FIGURE 4. 


(see Fig.4). Since P(X =c) = P(X <c)—P(X < c), we have 


or, in other words, 


For any c, the probability P(X = c) equals the jump of F (x) at the point c. (1.3.8) 


See again Fig.4. 
From (1.3.7)-(1.3.8) it follows that if a discrete r.v. X assumes values x,,x2,... with prob- 
abilities fi, f2,... , respectively, then its d.f. F(x) is constant in all intervals (x;,x;1), and 


makes a jump of f; at the point x;, i= 1,2,... ; see Fig.5. 


Now consider the continuous case. Let f(x) be the density of a r.v. X. Then by the 
definitions of density and d.f., 


Fig) =P OCS x)= fi Fedu (1.3.9) 


(Certainly, once we wrote x as a limit of integration, we should use another letter inside the 
integral.) 


FIGURE 5. The distribution function of a discrete random variable. 


16 0. PRELIMINARY FACTS 


From (1.3.9) it immediately follows that 


(1.3.10) 


Rigorously speaking, (1.3.10) is true for all x’s at which F (x) is differentiable. A good example is 
the uniform distribution which we will consider in detail in Section 3.2.1. The d.f. and density in this 
case are graphed in Fig.10 on page 31. We see that (1.3.10) is true for all x’s except 0 and 1, where 
F'(x) does not exist. As a matter of fact, the function F (x) is not differentiable at points where f(x) 
is not continuous. The reader may look up this fact in practically any textbook on Calculus, e.g., 
[136]. We skip these formalities since in all schemes we consider in this book, we can exclude from 
consideration x’s for which (1.3.10) does not hold. 

It is worth realizing also that the relations (1.3.9)-(1.3.10) are those between a function and its 
antiderivative, and are strongly related to the second fundamental theorem of Calculus. 


In conclusion, we briefly touch on the multidimensional case. Let X = (X1,...,X;), where 
X;’s are r.v.’s., and x stands for a non-random vector (x1,...,x,). The d.f. of X is the function 


F(x) = F(x1, ee = P(X, <x], Pep. 6: < xx). 


This is the probability of a “corner” (see Fig.6). 
Suppose that X has a density f(x). By virtue of (1.3.4), the multidimensional counterpart 
of (1.3.9) in this case is the relation 


X1 Xk 
F (eo) = f o f Fan uddu.dus (1.3.11) 


Differentiating (1.3.11), we get the counterpart of (1.3.10): 


ak 
fi, Xk) = = F (x1, ..., Xx) (1.3.12) 
OX] ...0Xk 


O72) The last formula turns out to be useful in many prob- 


lems. However, in general, d-f.’s in the multidimensional 

case do not play as important a role as they do in the one- 

dimensional case. They represent the probabilities of “cor- 

ners”, and these sets in the multidimensional framework have 
FIGURE 6. no advantage over other sets, say, circles. 


1.3.4 Quantiles 


In the literature, one can find several slightly different definitions of a quantile (see be- 
low). For our further purposes, it is convenient and logical to adopt the following definition. 

Consider a r.v. X with a d.f. F(x) = P(X <x). Let y€ [0,1]. 

If the r.v. X is continuous and its distribution function is strictly increasing, then qy, the 
y-quantile of X, is the unique number q for which F (q) = y; see Fig.7a. 

If there are many numbers q for which F (q) = y, the definition we adopt, chooses the 
right end point of the interval where F (x) = q; see Figures 7b,c. In the literature, one can 


1. Probability and Random Variables 17 


F(x) 
1 
A 
dy=9 i 
(e) 


FIGURE 7. Quantiles. 


find definitions where the y-quantile is the left end point, or the middle, or even any point 
from this interval. The difference is not essential. 

If the r.v. takes on some values with positive probabilities (and hence the d.f. has “jumps’’), 
it may happen that there is no number q such that F (q) = y; see Fig.7d. Then we choose 
the point at which F (x) “jumps” over the level y. 

In particular, if X = 0 with probability one, the point 0 is the y-quantile for all y € [0, 1) 
(see Fig.7e). 

The 0.5-quantile is called a median. If the distribution is continuous, then P(X < qos) = 
P(X > qos) = 0.5. (The r.v. is as likely to be larger than the median as it is to be smaller.) 
In general, since it may happen that P(X = qos) > 0, we have P(X < qos) < 0.5 and 
P(X > qos) < 0.5. 

Another term in use for a y-quantile is a 100yth percentile. 

Formally, the above definitions may be unified as follows. The y-quantile qy is the number 
such that F (qy—€) < y and F (qy +£) > y for any arbitrarily small € > 0. 

The reader familiar with the notion of supremum may realize that the y-quantile above may be 
also defined as gy = sup{x: F (x) < y}. 


1.3.5 Mixtures of distributions 
Let Fı (B) and F} (B) be two distributions, and let & € [0, 1]. Consider the distribution 
F® (B) = aF; (B) + (1—0)F(B). (1.3.13) 
We call the distribution F(® (B) a mixture of distributions F, , F». In particular, the d.f. 
F(x) = aF (x) + (1-0) F(x). 


Such a definition admits an explicit interpretation. Let F} and F be the distributions of 
r.v.’s X; and Xo, respectively, and a rv. 


x= e with probability a, 


X2 with probability 1 — a. (1.3.14) 


In other words, we choose X; with probability a, and X2 with probability 1 — a. (We skip 
formalities of defining the r.v.’s X ,X1,Xz on the same sample space.) Then the distribution 


18 0. PRELIMINARY FACTS 


of X is the linear combination (1.3.13) of F; and Fy, which we call here a mixture. Indeed, 
F(x) = P(X <x) = P(X <x|X, is chosen)P(X; is chosen) + P(X < x|X2 is chosen)P(X2 
is chosen) = P(X; < xja +P(X2 <x)(1-—a) = aFı (x) + (1 — a) Fz (x). 


EXAMPLE 1. Let a r.v. X with probability 1/2 equal 1, and with probability 1/2 take 
on values from the interval [0, 1], and all these values are equally likely. It is equivalent to 
the representation (1.3.14) with a = 5. the r.v. X; = 1, and the continuous r.v. X2 assuming 
values from [0, 1]. Since these values are equally likely, the density f2 (x) should be constant 
on [0,1] and equal zero for all other x’s. In view of (1.3.2), this leads to fo(x) = 1 for 
x € [0,1] and = 0, otherwise. This distribution is called uniform on [0,1]; we will consider 
it in more detail in Section 3.2.1. 

By (1.3.9), Fo(x) =x for x € [0, 1], F2 (x) =0 for x < 0, and Fz (x) = 1 for x > 1. For more 
detail on the d.f. of a uniform distribution see Section 3.2.1. 

The r.v. X; takes on the value 1 with probability one. So, Fi (x) =0 for x < 1, F(x) =1 
for x > 1 (graph F; (x) by analogy with Fig.7d). Then 


1 1 
F(x) = sf (x) + zP), (1.3.15) 


which amounts to 


0 ifx<0, 
1 
F(x) = 5X if0<x<1, (1.3.16) 
1 ifx>1; 
O 7 ea the graph is given in Fig.8. Note that the last distribution is 


neither continuous nor discrete. It is a mixture of continuous 
and discrete distributions. 


FIGURE 8. 


2 EXPECTATION 
2.1 Definitions 


Below, unless stated otherwise, we assume that all series or integrals under consideration 
exist and are finite. 

The expected value of a r.v. is often defined in textbooks for the discrete and continuous 
cases separately. If a r.v. X is discrete and takes on values x;,x2,... with probabilities 
fi, f2,---, respectively, the expected value, or the mean value, by definition, is 


E{X} = } aifi (2.1.1) 


(where the summation is taken over all possible values 7), provided that the above series 
converges. 


2. Expectation 19 


If f(x) is the density of a r.v. X, its expected value 


E{X} = i E f(x)dx, (2.1.2) 


provided that the above integral exists and is finite. 
It may be proved that, similar to (2.1.1), for a function u(x), 


E{u(X)} =) ula) fi, (2.1.3) 


where the summation is over all possible values of i. 
If f(x) is the density of a r.v. X, the expected value 


E{u(X)} = T uka) RO (2.1.4) 


Clearly, (2.1.1) and (2.1.2) follow from (2.1.3) and (2.1.4) respectively, if we set u(x) = x. 
We can unify and generalize the definitions above by writing 


E{u(X)} = / “ u(x)dF (a), (2.1.5) 


where F (x) is the d.f. of X. The last integral is called a Riemann-Stieltjes integral and is 
defined as follows: 


° If X has a density f(x), then dF (x) = f(x)dx, and the integral above coincides with 
(2.1.4). 


e If F(x) is constant on an interval (a,b], that is, P(X € (a,b|) = 0, we again define dF 
on this interval as a usual differential and have dF (x) = 0. (In this case the interval 
(a, b] is automatically excluded from integration in (2.1.5).) 


If at a point c, the d.f. F(x) has a jump of 6 = F(c) — F (c — 0), that is, P(X = c) = ð, 
then we define dF (c) as the jump at c, that is, dF (c) = 6. This is a definition, but it 
is quite natural for the following reason. The differential dF (x) means the change of 
F(x) in the infinitesimally small interval [x,x-+ dx], but if F (x) jumps at c, the change 
is not small and equals F (c) — F (c — 0). 


So, the part of the integral (2.1.5) at the point c above is u(c)8 = u(c) [F (c)—F (c—0)]. 


In particular, if X is discrete, and assumes values x1 ,x2,... with probabilities fi, fo,..., 
respectively, we set dF (xj) = fj for all j, and dF (x) = 0 in all intervals between 
x;’s. So defined, the integral [7 u(x)dF (x) = £Z ;u(x;) fj; that is, we have come to 
the definition (2.1.3). 


In any case, 


We view dF (x) as P(x < X <x+dx), the probability that X will 
assume a value from an infinitesimally small interval |x, x + dx]. 


20 0. PRELIMINARY FACTS 


In particular, definition (2.1.5) allows us to consider mixtures of continuous and discrete 
distributions. 


EXAMPLE 1. Let us return to Example 1.3.5-1. Let u(x) = x?. By (1.3.16), we can 
write 


E{u(X)} = [wear (a = [ u(x) sdx-+u(){F(1) -F(1—0)] 


Ls 1 2 

2 2 
= Se 
[eye 2 3 


Another way to solve the same problem is to use (1.3.15) by writing dF (x) = $dF\(x) + 
dF) (x) and substituting it into (2.1.5). We have 


E{u(X)} = J  uadFa) = 5 [iu AGA >| FIDE 
af +5 fuera S Paes 


Next, we consider a set B in the real line, and the function 


lif x€B, 
Oe Tt 


The function g(x) is called the indicator of B. 

Let u(x) in (2.1.5) be equal to Jg(x). Then the rv. u(X) = Jp(X) = 1 if X € B, and 0 
otherwise. Hence, E{u(X)} = 1: P(X € B)+0-P(X ¢ B) = P(X € B) = Fx(B). On the 
other hand, in view of (2.1.5), 


E{u(X)} = f r) )dF (x )= [are 


(since Ig(x) = 0 for x ¢ B). Thus, 


B)= if dF (x), (2.1.6) 
and we see that, indeed, the d.f. F (x) determines Fy (B) for all B. 
In conclusion, we state without proofs two elementary properties of expectation. 
(i) For any numbers c1,c2 and any r.v.’s X1, Xo, 
E{c1X1 +. c2X2} = c1 E {X1} + cE {Xo}. (2.1.7) 
(ii) For any independent r.v.’s X1, X2, 
E{X)X2} = E{X1}E {X2}, (2.1.8) 


provided that the expectations above exist. 


2. Expectation 21 


2.2 Integration by parts and a formula for expectation 


An essential advantage of the representation (2.1.5) is that it points out the possibility of 
integration by parts. Consider, for simplicity, a positive r.v. X, so its d.f. F(x) =0 for x < 0. 
Let u(x) be a differentiable function such that the integral fọ u(x)dF (x) exists and is finite. 
Set, also for simplicity, u(0) = 0. Then 


E{u(x)} = f ud =- f uda- Fo) 
= -u(e)[1 = F()] +u()[l - F0] + f 0- F(a) du. 


Since u(0) = 0, we have u(0)[1 — F (0)] = 0. Because F (œ) = 1, if u(cc) = æ, then in the 
expression u(œ)|1 — F (%)], we have the indeterminate form œ-0. As a matter of fact, we 
can set u(cc)|1 — F(cc)] = 0. To prove this, one should show that |u(x)|{1 — F (x)] — 0 as 
x — œ. We will prove it in Section 2.5. So, eventually 


E{u(X)} = fa — F(x))du(x). (2.2.1) 


Setting u(x) = x, we obtain from (2.2.1) a useful formula for the expected value of a 
positive r.v.: 


E{x} = 1 S (2.2.2) 


Some examples are considered in Section 3. 
In particular, from (2.2.2) it follows that the ex- 
a pected value equals the area between the graph of 
F (x) and the line y = 1 (see Fig.9). 


FIGURE 9. 


Consider now an integer valued r.v. X assuming 
only values 0,1,2,.... In this case, the d.f. F(x) is constant on intervals [n,n + 1) for 
n=0,1,... (see also Fig.5). Then, for any x € [n,n +1), we have 1 — F(x) = 1 — F (n) = 
P(X > n), and from (2.2.2) it follows that 


œ pn+1 atl 
E{x}= a (1—F(x))dx = yx >n) f dx= Ý P(X >n)- 1. 


Thus, 
E{X} = } P(X >n). (2.2.3) 
n=0 
We will use formulas (2.2.2) and (2.2.3) repeatedly in this book. 


2.3 Can we encounter an infinite expected value in models of real 
phenomena? 


Certainly, we can construct a r.v. X for which E{X} =œ. Let, say, X take on values 


2,4,8,...,2*,..., and so on, with probabilities 5, i ...,2-*,..., respectively. Then E{X} = 


22 0. PRELIMINARY FACTS 


2. 5 +2. 5 +... $2%.2-* 4 = 1 +I+...+14... = 00. We consider this r.v. in Section 
1.2.2 in connection with the so called St. (Saint) Petersburg’s paradox. 

This example, however, is somewhat artificial, and the question is whether we can en- 
counter infinite expectations in models of real phenomena. In this book, we will see several 
such examples. The first is considered below. 


EXAMPLE 1. (Record values.) Let X,,X,... be a sequence of independent identically 
distributed (i.i.d.) r.v.’s, and N be the number of the first X; which is larger than X,. For- 
mally, N = min{n : Xn > X1}. Let, for instance, X; be the first payment made by an in- 
surance company in the new year, X,X3,... be next consecutive payments, and N be the 
number of the first payment among X2,X3,... which was larger than Xı. We show that 
E{N} =œ. 

Let the event Aj, = {X; > X1,...,.X; > Xn}, i=1,...,n. So, if Aj, occurs, then among the 
first n r.v.’s, the ith r.v. assumes a record value in the sense that it is not less than the values 
of the other n — 1 r.v.’s. Then the event {N > n} = Ain. In view of symmetry, P(Ajn) is 
the same for all i. Indeed, because the X’s are 1.i.d., among the r.v.’s X,...,X, each has “an 
equal chance to take on a value not smaller than the values of the others.” On the other 
hand, at least one A;n should occur, which means that UL ,Ajn = Q. Then, by (1.1.4), 


1 = P(Q) = P(UL Ain) < Y_, P(Ain) = nP(Ain). 


In the last equality, we used that P(A;,) = P(Aj,) for all i. Hence, P(N >n) = P(Aj,) >1/n 
for all n = 1,2,.... Using (2.2.3) and neglecting in it the first term, we have 


EIN =} PN one ye 


As we remember from Calculus, the last series (called harmonic) diverges, that is, equals 
infinity. 


To make the definition of an infinite expected value more precise, we say that for a non- 
negative r.v. X with a d.f. F, the expected value E{X} = o if the integral [J xdF (x) =. 
If X is non-positive, we say that E{X} = —% if f°. xdF (x) = —e. 

When dealing with r.v.’s taking positive and negative values, we should be more cautious. 
Any r.v. X may be represented as X = X* —X~, where X* and X~ are positive and negative 
parts of X, respectively. More precisely, X+ = max{X,0}, and X~ = max{—X,0}. Note 
that both quantities, X*,X~, are positive. Let us write E{X} = E{X*}— E{X7}. 


© If E{X*} < œ, and E{X~} < œ, we define E{X} in the usual way, and we say that 
E{X} is finite. Certainly, in this case, E{X} = E{Xt}— E{X7 }. 


° If E{X*} = œ, and E{X~ } < œ, we say that E{X} = œ. 


e If E{X*} < œ, and E{X~ } =, we say that E{X} = —o. 


° If E{X7} = œ, and E{X~ } =, we say that E{X} does not exist. 


2. Expectation 23 
2.4 Moments of r.v.’s. Correlation 


2.4.1 Variance and other moments 


Assuming that E{X?} < æ, we define the variance of X as the quantity 
Var{X} = E{(X — E{Xx})*}. 
Skipping proofs, we list some properties of variance. First, it is easy to verify that 
Var{X} = E{X?} — (E{X}). (2.4.1) 
Secondly, for any constant c, 
Var{X +c} =Var{X}, Var{cX} =c?Var{Xx}. (2.4.2) 
Furthermore, for any independent r.v.’s X and Y, 
Var{X + Y} = Var{X}+4+Var{Y}. (2.4.3) 


Note that Var{X —Y } is equal to Var{X } + Var{Y } rather than to Var{X } —Var{Y } (!). 
Indeed, by (2.4.2), 


Var{X —Y} = Var{X + (—Y)} = Var{X} + Var{—Y} =Var{X} + (—1)’Var{Y} 
= Var{X}+Var{Y}. 


The quantity oy = ,/Var{X } is called the standard deviation of X. 

For a r.v. X, the quantities mg = E{X*} for a natural k and m, = E{|X|*} for any k > 0 
are called the kth moment and the kth absolute moment, respectively. So, the expectation 
E{X} is the first moment. 

Set m = my. The quantities up = E{(X —m)*} and jj, = E{|X —m|*} are called the kth 
central moment and the kth absolute central moment, respectively. So, the variance is the 
second central moment. 

Before considering the notions of covariance and correlation, we present a fundamental 
inequality. 


2.4.2 The Cauchy-Schwarz inequality 


For any r.v.’s € and n with finite second moments, 


(E{En}) < E{Ẹ YE {Nn}. (2.4.4) 
Proof. Let t be a real number. Then 
0 < E{(G—m)?} = E{Ẹ?} — WE {En} +P E {1}. (2.4.5) 


Denote by Q(t) the r.-h.s. of (2.4.5). As a function of f, this is a quadratic function. Since 
Q(t) > 0 for all ż, the discriminant of Q(t) should be non-positive. The discriminant is 
equal to 4(E{En})? —4E{E*}E{n?} = 4[(E{En})? — E{E7E{n7}]. We see that it is non- 
positive if and only if (2.4.4) is true. W 


24 0. PRELIMINARY FACTS 


It is straightforward to verify that, if & = ton for some number fo, then (2.4.4) becomes a 
strict equality. 

Conversely, let (E{En})” = E{E2}E{n2}. Then the discriminant above equals zero. 
Consequently, there exists only one root—say, to, of the equation Q(t) = 0. In this case, 
E{(§&—ton)*} =0. The r.v. (6 — ton}? is non-negative. Its expectation may be equal to zero 
only if € — ton = 0 with probability one. Thus, 


(E{En})° =E{E-}E{n"} iff E = ton for some to with probability one. (2.4.6) 


2.4.3 Covariance and correlation 


Covariance is a measure of dependency between two r.v.’s. Let X1,X2 be r.v.’s with 
finite second moments, m; = E{X;}, 07 = Var{X;} > 0. (Unlike above, here the symbol m; 
denotes the mean value of X;.) We call the covariance between X; ,X2 the quantity 


Cov{X X2} = E{(X1 = m)(X2 = m)}. 
Multiplying the variables in the parentheses above, one may prove that 
Cov{X,, X2} = E{X;1X2} — mm) (2.4.7) 


(compare with (2.4.1)). 

Note also that Cov{X,,X1} = E{(X; —m )?} = Var{X}}. 

If X1,X2 are independent, Cov{X),X2} = 0. Indeed, in this case, by virtue of (2.1.8), 
Cov{X1, X2} = E{X; = mı }E{X2 = m} = 0, since E{X,—m}=E{X,}—m,=m,—m,=0. 
(Certainly, E{X2 — mz} also equals zero, but it suffices to consider E{X; — mı }.) 


Thus, 
If Cov{X1,X2} Æ 0, then the r.v.’s X1,X2 are dependent. 


The converse to the above assertion is not true. 


EXAMPLE 1. Consider a r.vec. X = (X1,X2) whose distribution is presented in Fig.2b 
in Example 1.3.2-1. As was noted there, the r.v.’s X;, X2 in this case are dependent. On the 
other hand, as is easy to compute, E{X; } = E {Xz} =0, and in view of (2.4.7), Cov{X1,X2} = 
E{X,X2}. In our example, for any outcome, one of variables, X; or X2, equals zero, so 
X,X2 = 0. Hence, Cov{X1,X2} = 0. 


Thus, independence implies zero covariance, but not vice versa. 
It is straightforward to verify that for any r.v.’s X1, X2, 


Var{X, + X2} = Var{X1} + Var{X2} + 2Cov{X, X2}. 
Thus, for (2.4.3) to be true, we only need Cov{X, Xz} to be zero, which, as we saw, is a 
weaker property than independence. 
The correlation coefficient of r.v. X,,X2, briefly the correlation, is the quantity 


Cov{X,, X2} 


Corr{X,,X2} = ae 


(2.4.8) 


2. Expectation 25 


(We have assumed in the beginning of this section that o; > 0.) Note that correlation 
is a dimensionless characteristic. Say, if we measure X,,X» in dollars, the dimension of 
Cov{X,,X2} and Var{X;} is dollar?, while Corr{X,,X2}, as follows from (2.4.8), does not 
have a dimension. For this reason, correlation, which may be viewed as the normalized 
covariance, is a more adequate measure of dependence. 

The following properties are true. 


1. -1 < Corr{X,,X2} < 1 for all X1,X2. 
2. If Corr{X,,X2} = 1, then X2 = a+ bX, for some a and b > 0. 


3. If Corr{X,,X2} = —1, then X2 = a+ bX; for some a and b < 0. 


R.v.’s X1, X2 for which Corr{X,X2} =0 are called non-correlated or uncorrelated. R.v.’s 
for which Corr{X,X2} > 0, are called positively correlated. If Corr{X,,X2} < 0, the r.v.’s 
Xı,X2 are negatively correlated. R.v.’s for which Corr{X,,X2} = +1, are called perfectly 
correlated. 

A good exercise is to derive Property 1 from (2.4.4), and Properties 2-3 from (2.4.6). (Set 
=X =m, N =X — m.) 


2.5 Inequalities for deviations 


For any r.v. X, the probability P(X > x) > 0 as x > œ. Our next goal is to estimate 
how fast it is vanishing. For large x’s, the probability P(X > x) is often called a tail of the 
distribution of X, or the probability of large deviation. Similarly, one can consider P(X < x) 
as x — —oo, The latter is referred as a left tail, and the former—in this context—as a right 
tail. 


Proposition 4 (An inequality for deviations). Let a function u(x) be non-negative and 
non-decreasing. Then for any x and any rv. X, 


E{u(X)} 


P(X >x)< Toa 


(2.5.1) 
provided that the r.-h.s. of (2.5.1) is finite. 


Replacing X in (2.5.1) by |X|, and setting u(x) = x* for x > 0 and k > 0, we come to 
Corollary 5 (Markov’s inequality). For any x > 0, 


P(|X|>x)< 3 (2.5.2) 


Let m = E{X}. Setting k = 2 in (2.5.2) and replacing X by X — m, we come to 


26 0. PRELIMINARY FACTS 


Corollary 6 (Chebyshev’s inequality). For any x > 0, 


\< E{(X etl) _ Var{x} (2.5.3) 


P(\X—m| >x 
X 


Proof of Proposition 4. Let F (x) be the d.f. of X. Using consecutively in the inequalities 
below the facts that u(x) is non-negative and non-decreasing, we have 


Eu} = | odrez f uaF f uode 


a) f "dF (2) = u(x)P(X > x). m (2.5.4) 


> Let X be non-negative, and x > 0. In (2.5.4), we have shown that 
f u(z)dF (z) > u(x)P(X > x). (2.5.5) 
x 


If the integral fọ u(x)dF (x) is finite, the 1.-h.s. of (2.5.5) converges to zero as x — œ. Then 
so does the r.-h.s., which we promised to prove in Section 2.2. < 


2.6 Linear transformations of r.v.’s. Normalization 


Let X be ar.v. with a d.f. F(x). Denote by f(x), m, and o° the density, the expected value, 
and the variance of X, respectively. Let ar.v. Y =a+bX, where a and b are numbers, b > 0. 
Then the d.f. of Y is the function 


Fy (x) = P(Y <x) =P(a+bX <x) = P(X < (x—a)/b) = F((x—a)/b). 


If the density f(x) exists, we can use (1.3.10), which implies that the density of Y is 
frx) = 4 Fy (x) = ¢f((x—a)/b). Using rules (2.1.7) and (2.4.2) for means and variances, 
we summarize all of this in the following table. 


the d.f. the density the mean | the variance 
2 
X F(x) f(x) m (o (2.6.1) 
1 
a+bX,b>0 | F((x—a)/b) pi (œ —a)/b) a+bm bo’ 
Let X’ = X — m. Then 
E{X'} =E{X}-m=m-m=0. 
The r.v. X’ is called centered, and the operation itself—centering. 
Next, assuming © > 0, consider the r.v. 
Nee 
gaa (2.6.2) 


2. Some Basic Distributions 27 


The reader can verify directly or by making use of rule (2.6.1) that 
E{X*} =0, Var{x*} = 1. (2.6.3) 


We call such a r.v. normalized, and the operation (2.6.2)—standard normalization, or 
simply normalization. We will use this operation repeatedly in this book. It is worth 
emphasizing that X* is the same r.v. X, but considered, so to speak, in a standard scale. 


EXAMPLE 1. Let S, =X, +...+X,, where X;’s are i.i.d. r.v.’s with the same distribution 
as X above. Sometimes such r.v.’s are called independent replicas of X. By virtue of rules 
(2.1.7) and (2.4.2), 

E{S,}=mn, Var{S,} =07n, 


and for increasing n, the last characteristics are increasing. We can rescale S,, by defining 
st = Sn — mn 
oyn 
For S*, we have E{S*} = 0 and Var{ S} } = 1. 


3 SOME BASIC DISTRIBUTIONS 


In this section, we list some distributions playing an important role in theory and appli- 
cations. Table 1 in Appendix, Section 1 presents a summary of the distributions we discuss 
below. In next chapters, we will extend this list. Proofs of facts given below may be found 
in practically any textbook on Probability; see, e.g., [102], [116], [120], [122]. 


3.1 Discrete distributions 
3.1.1 The binomial distribution 


Let n > 1 be an integer and p € [0,1]. The binomial distribution with parameters n and 
p is the distribution of a r.v. X taking values 0, 1,...,n, and such that 


TERNE (o pqg, 6.1.1) 


where q = 1 — p, and 


n\ n! _ n(n=1) =- (n=k+1) 
(;) k!(n—k)! kl (3.1.2) 


the number of ways to choose k objects from n distinct objects. 
For future use, let us observe that the quantity (1) may be defined for any real number r 
and integer k > 0, if we adopt as a definition the last expression in (3.1.2), setting 


(") _r(r= Ark +1), ats 


k k! 


28 0. PRELIMINARY FACTS 


The binomial distribution with parameters n and p above usually appears in applications 
as the distribution of the number of successes in the sequence of n independent trials if the 
probability of success in a separate trial is equal to p. Adopting this interpretation, we can 
represent X as follows. 

Consider the r.v.’s X,..., Xn such that X; = 1 if the ith trial is successful, and X; = 0 
otherwise. Then the total number of successes is given by 


X=X,+...4Xn. (3.1.4) 


The r.v. X;’s defined above are called Bernoulli’s variables. We have assumed that P(X; = 
1) = p, and since the trials are independent, the r.v.’s X; are independent. Because E {X;} = 
p and Var{X;} = pq (the reader is encouraged to show this), from (3.1.4) it follows that 


E{X}=np and Var{X} =npgq. 


The next distribution we consider is multivariate, and represents a natural generalization 
of the binomial distribution. 


3.1.2 The multinomial distribution 


Assume that each of n independent trials may result in any of / possible outcomes with 
respective probabilities p1,...,p; such that ES pi = 1. Denote by K; the number of trials 
with outcome i. Consider the joint distribution of the r.v.’s K,,...,Kj. Let my,...,mj be 
non-negative integers such that mı + ... +m; =n. Then for any such integers, 


P(Ki =m,...,K,) = mı) = mie eee (3.1.5) 


(see, e.g., [102], [116], [122].) It is worthwhile to emphasize that the marginal distributions 
of K;’s are binomial; for example, 


PK =4) = (Pi aay 6.16) 


Formally, relation (3.1.6) follows from (3.1.5) if we set mı = k and add up all probabilities 
(3.1.5) over all possible values of m2, ...,mı. However, we can justify (3.1.6) without cal- 
culations if we consider n independent trials above, and call a trial successful if it results in 
the first outcome. Then K; is the number of successful trials. 


3.1.3 The geometric distribution 


Two closely related distributions are called geometric in the literature. First, this is the 
distribution of a r.v. N taking values 1,2,... and such that 


fe=P(N =k) = pq", (3.1.7) 


where the parameter p € [0,1], and q = 1 — p. 

A classical example is the distribution of the number N of the first successful trial in a 
sequence of independent trials with the same probability of success p. In this case, the 
event {N > k} occurs if the first k trials are not successful, which implies that 


P(N >k)= é. (3.1.8) 


3. Some Basic Distributions 29 


In some applications it is more convenient to use the term ‘geometric distribution’ not 
for the distribution above but for the distribution of the rv. K = N — 1 which, naturally, 
assumes values 0,1,2,.... Then 


P(K =k) = pq‘, k=0,1,2,.... (3.1.9) 


It follows from (3.1.8) that 
P(K>k=q"). (3.1.10) 


The first version of the geometric distribution has the following property: for any integers 
mand k 
P(N>m+k|N>k)=P(N>m). (3.1.11) 


We may clarify this in the following way. Assume that there was no success during the first 
k trials. The property (3.1.11) means that this fact has no effect on how long we will wait 
for a success after k trials: the probability that it will happen after an additional m trials 
does not depend on k. 

Such a property is called the memoryless or the lack of memory property. In comparison 
with the exponential distribution that we consider later, it is worth emphasizing that (3.1.11) 
is true only for integers m and k. As a good exercise, the reader can check that, for instance, 
for m = k = 2.5 the relation (3.1.11) is not true. 

It is easy to double check that for the r.v. K, (3.1.11) should be slightly changed: 


P(K >m+k|K >k) =P(K >m-1). 
It may be computed that 


1 
E{N} =—, E{K} =", and Var{N} = Var{K} = 4. (3.1.12) 
P P P 
(See practically any textbook on Probability, e.g., [102], [116], [122, p.67], and also Section 
4.4 where we compute moments of the negative binomial distribution.) 
In this book, we will primarily consider the geometric distribution in the sense of K. 


3.1.4 The negative binomial distribution 


Consider again a sequence of independent trials each having probability p of being a 
success. Suppose that the trials are performed until a total of v successes is accumulated, 
and let My be the number of the trials required. In other words, My is the number of the trial 
at which the vth success occurs. Then My takes on values v,v+1,.... The vth success may 
occur at the mth trial only if the mth trial is successful, and among the previous m— | trials 
there are exactly v — 1 successes. Then, as it follows from (3.1.1), 


-1 -1 
P(Ny =m) = a a = ee m=V,V+1,... (3.1.13) 


(see also, e.g., [102], [116], [122]). 
Denote by T; the number of trials after the (i — 1)th success until the ith success occurs. 
For i = 1, we set Ti = N4. Clearly, W= T +h ++.. +K. 


30 0. PRELIMINARY FACTS 


Since the trials are independent, the r.v.’s 7; are independent. For the same reason, to 
wait for the ith success after the (i — 1)th success has occurred, is the same as to wait for 
the first success. So, all 7;’s have the geometric distribution. Then, in view of (3.1.12), 


E{Ny} = a Var{Ny} = a (3.1.14) 


As in the case of the geometric distribution, we consider an alternative definition of the 
negative binomial distribution. This is the distribution of the r.v. Ky = Ny — v, which takes 
on values 0,1,2,.... Since P(Ky =m) = P(Ny = m + v), it follows from (3.1.13) that 


-1 
PK =m) = (V Joa, m=0,1,2,..., 
v—-1 
or, due to the formula (7) = (,,",), 
-1 
Peme (o Joe, EN E A (3.1.15) 
m 
From (3.1.14), we get that 
E{K} =% -v =“, var{x} =“4. (3.1.16) 
P P P 


Distribution (3.1.15) appears in many applications including those which are not relevant 
to a sequence of trials and numbers of successes. Moreover, in some applications, the 
parameter v in (3.1.15) is positive but not necessarily an integer. The last instance requires 
clarification. 

Consider a r.v. Ky, not connected with a sequence of trials, whose distribution is formally 
defined by (3.1.15). The parameter v in (3.1.15) is positive, and the coefficients (a) 
are defined in accordance with the formula (3.1.3). Since v > 0, all these coefficients are 
positive, and hence all probabilities in (3.1.15) are also positive. Then, to show that the 
distribution is well defined, it remains to prove that the sum of these probabilities equals 
one. To this end, we use the Taylor expansion (4.2.9) from Appendix for the function 
(1 —x)~%, which gives 


LPK =m) =P YD (oe ‘a =p Spe, St, 
m=0 m=0 m 
The distribution (3.1.15) is called negative binomial with parameters p and v. If v is an 
integer, the distribution of the r.v. Ky +v coincides with the distribution of the vth success. 
If v = 1, this is the geometric distribution with parameter p. 
Formulas (3.1.16) remain true in the general case of an arbitrary positive v. An easy way 
to show this, is to use moment generation functions, which we will do in Section 4.4. 


3.1.5 The Poisson distribution 


For reasons discussed later in Sections 3.2.1 and 5.2, this distribution plays a very im- 
portant role in theory and applications. Formally, this is the distribution of a r.v. N taking 
values 0,1,2,... and such that 


2k 
fe = P(N =k) =e a k=0,1,2,.... 


3. Some Basic Distributions 31 


a D- X 
FIGURE 10. The density and the distribution function of a random variable 
uniform on [a,b]. 


The positive parameter À has a simple sense: 
E{N} =). (3.1.17) 

The variance coincides with the mean value: 
Var{N} =À. (3.1.18) 


This is not an accident and may be explained by making use of properties of the Poisson 
process we consider in Section 4.2 devoted to random processes. Proofs of (3.1.17)-(3.1.18) 
use expansion (4.2.5) in Appendix, and may be found in practically any textbook on Prob- 
ability. 

In this book, we will repeatedly return to properties of the Poisson distribution. In partic- 
ular, we will prove in Section 2.2.1.2 that, if N; and Mz are independent Poisson r.v.’s with 
parameters A, and A», respectively, then the sum N; + N> has the Poisson distribution with 
parameter A; + Ap. 


3.2 Continuous distributions 
3.2.1 The uniform distribution and simulation of r.v.’s 


We already considered this distribution. Now, we explore it in more detail. Let a con- 
tinuous r.v. X assume all values from an interval [a,b] and only from [a,b]. The last fact 
implies that fy (x), the density of X, should be equal to zero for x ¢ [a,b]. If we assume that 
all values from [a,b] are equally likely, we must set f(x) equal to a constant c for x € [a,b], 
since otherwise these values will not be equally likely. On the other hand, we should have 
J. i fx(x) = 1, which implies that c should be 1/(b — a). Thus, we have arrived at 


E fam if x € [a,b], 


see Fig.10. Substituting (3.2.1) into (1.3.9), it is not difficult to derive that the distribution 


function of X is 
0 if x<a, 


Fy(x)= 4 j% if a<x<b, (3.2.2) 
1 ifx>b; 


32 0. PRELIMINARY FACTS 


see again Fig.10. 

Note that, if a r.v. Z is uniformly distributed on [0,1], then the rv. Y = a + (b — a)Z is 
uniformly distributed on [a,b], that is, Y has the same distribution as X above. Formally, it 
follows, for example, from (2.6.1), but one may show it without calculations. If Z assumes 
values from [0,1], then Y assumes values from [a,b]. If values of Z from [0, 1] are equally 
likely, the same is true for values of Y from [a,b]. 

From the last fact it follows that if we have a generator of random numbers from [0, 1], 
we do not need a special generator for simulating values of X. We would simulate values 
of Z and apply the linear transformation a + (b—a)Z. 


Making use of (2.1.5) and (2.4.1), we have 


l 1 l 1 1 
E{Z} = SRS =- Z\}=—. 
{Z} f xix 5? {Z°} yee z» Var{Z} o 
Hence, 
1 
E{X} =E{Y} = at (b-a); = H, 


which is to be expected. Indeed, since all values are equally likely, the mean should be in 
the middle of [a,b]. By (2.6.1), 


(b-a)? 
12 ` 


Var{X} = Var{Y } = (b—a)’Var{Z} = 


It is worthwhile to warn the reader against the following 
widespread but incorrect reasoning. 


Assume that we know that a r.v. € takes on values from, say, [0,1], but we have no 
additional information about the distribution of €. Since we equally do not know anything 
about chances of values from [0,1], we may consider these values equally likely. Hence, 
we can assume that € is uniform. Then, by (3.2.2), the d.f. of § is Fẹ (x) = x for x € [0,1]. 

But the r.v. &? also takes on values from [0,1], and we do not have any additional infor- 
mation about its distribution either. Then, on the same grounds, we can set Fez (x) =x, and 
write x = Fẹ (x) = P(§ < x) = P(Ẹ < x”) = F(x?) = x°. So, we have arrived at a false 
assertion that x = x?. 

The above reasoning was faulty because the absence of information does not allow us 
to jump to any particular conclusion about the distribution of &. Knowing nothing means 
that we cannot say anything about the distribution except that it is concentrated on [0, 1]. 
Whereas the assertion on uniformity should be based on the rather concrete information 
that all values from 0, 1] are equally likely. 


Simulation. The inverse distribution function method. The following property of the 
uniform distribution is very important for the simulation of r.v.’s. 

Let F(x) be a d.f., and let F~'(y) be its inverse. If for some y, there are many x’s for 
which F (x) = y or there is no such an x, then we define F7! (y) similar to what we did in 
Section 1.3.4. Namely, F~'(y) is a number x such that F(x —e€) < y and F(x+8) > y for 


3. Some Basic Distributions 33 


O Fa) F"(b)=F (c) Fd) Fg) 
FIGURE 11. The inverse of a distribution function. (The letter g is chosen since e 


stands for the natural e, and f—for the density.) 


any € > 0. The definition is illustrated in Fig.11. 


The reader familiar with the notion of supremum may realize that the above inverse F~!(y) = 
sup{x: F(x) <y}. 


Proposition 7 Let Z be uniformly distributed on [0,1]. Then the rv. X = F~'(Z) has 
the distribution function F (x). 


Proof. As we saw above, P(Z < z) = z for z € [0,1]. On the other hand, for any x we 
have P(X < x) = P(F7! (Z) <x) = P(Z) < F(x)) = F(x), since 0 < F(x) < 1. E 


EXAMPLE 1. Let X take on values from [0,2] and have 
the d.f. F(x) =x3/8 for x € [0,2]. We want to simulate val- 


ues of X, for instance, using Excel. Since F~!(y) = 2y!/3, 

we may represent X as X = F—!(Z) =2Z!/3, where Z is uni- T 
form on [0,1]. In the Excel worksheet in Fig.12, five values ae 

of Z are simulated in Column A with use of the Excel ran- 1.919908 
dom number generator. The corresponding values of X are AT NL T 
in Column B. For example, B1=2*A1^ (1/3). the r.v. Z H the rv. X 


FIGURE 12. 


3.2.2 The exponential distribution 


We call a continuous positive r.v. Xı and its distribution standard exponential if the cor- 
responding density fı (x) = e™* for x > 0. It is straightforward to compute that E{X; } = 
Io xe dx = 1, E{X?} = fg x e™*dx = 2, and hence, Var{X;} = 1. 

Consider now the r.v. X = X, = X1 /a for a positive a. In accordance with (2.6.1), the 
density of X4 is 


0 ifx <0, 
fal) = { ae © ifx>0, Cee) 


34 0. PRELIMINARY FACTS 


and 
E{X,} =1/a, Var{X,} =1/a’. (3.2.4) 


Such a r.v. and its distribution are called exponential with a parameter a which, as we see, 
is a scale parameter: X, = X; /a. 
By (1.3.9), we readily get that the d.f. 


ees 025 
Consequently, 
AX Saye", (3.2.6) 
from which the term “exponential” comes. 
The exponential distribution has the unique 
Lack-of-Memory (or Memoryless) Property: for any x,y > 0, 
P(X >x+y|X >x)=P(X >y). (3.2.7) 
Indeed, 
Pesan t= ee zie ae l (3.2.8) 
Then, in view of (3.2.6), 
—a(x+y) 


e 
P(X >x+y|X >x) = 


=e ® = P(X >y). 


—ax 


Thus, if X has exceeded a certain level x, the overshoot (over x) has the same distribution 
as the r.v. X itself, and does not depend on the particular level x that was exceeded. We 
comment on this property in more detail in Section 2.1.1. 


3.2.3 The r(gamma)-distribution 


First, for all v > 0, we define the T (gamma)-function 
T(v) = i, a ede: 
0 


It is easy to verify that [(1) = fọ e *dx = 1 and, using integration by parts, that 


T(v+1) =vI(v). (3.2.9) 
Then for an integer k > 1, we have [(k+1) = kT (k) =k(k—1)T(kK-1) =... =k(k-1)- 
.. 1-T(1) = k!. Thus, 
T(k+1) =k! 
for k =0,1,.... So, the I -function may be viewed as a generalization of the notion of 


factorial. 


3. Some Basic Distributions 35 


f(x)=e f(x)=xe * f(x)=2x*e* 


FIGURE 13. TheI-densities for v = 1,2,3. 


Consider, first, a continuous r.v. Xıy whose density is the function 


fw) = ao for x > 0, and = 0 otherwise. 
Due to I'(v) in the denominator, fọ fiy(«)dx is indeed equal to one. For v = 1, this is the 
standard exponential density. The parameter v characterizes the type of the distribution. In 
Fig.13, we demonstrate how this type depends on v by sketching the graphs of fiy(x) for 
v = 1,2,3. 
The kth moment 


1 al 1 oo 
E{Xi\} = Toe xx le “dx = m, xt- edx 


Tv) 
_Twt+k) W+k-1PW+k—-1) Wt+k—-1)(v+k—-—2)PW+k—2) _ 
rv) r(v) 7 T(v) 7 
e (vk=1) toe VERY) 2 ns 
= rw) =(v+k-—-1)-...-v, 


by virtue of (3.2.9). In particular, 
E{Xiy} =v, E{X},} = (v+1)v, and hence Var{Xiy} =v. (3.2.10) 


Now, let a > 0 and the r.v. Xav = X1y/a. By (2.6.1), the density of Xay is 


V 
fax) = rey A forx > 0, and =O otherwise. (3.2.11) 
The distribution defined and its density fay(x) are called the [(Gamma)-distribution and 
I-density, respectively, with parameters a and v. As we saw, a is just a scale parameter, 


while parameter v may be called essential since it specifies the type of the distribution. 
From (3.2.10) it follows that 


(v+1)v 
a2 


, and Var{Xav} =~. (3.2.12) 
a 


v 
E Kay} =~, E{X3} = 


36 0. PRELIMINARY FACTS 


3.2.4 The normal distribution 


Even the name of this distribution points out the important role which it plays in theory 
and applications. This distribution is called also Gaussian, and even the bell distribution 
(to emphasize the shape of its density curve), but the last term is not used in Probability 
Theory itself. 

A r.v. X and its distribution are called standard normal if the corresponding density is 
the function 


1 
o(x) = TR (eas (3.2.13) 


The function @(x) is called the standard normal density; its 
graph for —3 < x < 3 is presented in Fig.14. 

Since density (3.2.13) is an even function, the integrand in 
the integral [7 xọ(x)dx is an odd function, and hence this in- 
tegral equals zero. Integrating by parts, one can compute that 
J2.x°Q(x)dx = 1. Thus, 


E{X} =0, Var{X} =1, (3.2.14) 


from which the term “standard” comes. 
FIGURE 14. Let m be an arbitrary number, and let o > 0. The rv. Y = 
m+ 0X and its distribution are called (m,o”)-normal. In view of (3.2.14) and (2.6.1), 


E{Y} =m, Var{Y} =o’, (3.2.15) 


which justifies the choice of the notation m and o°. By the same rule (2.6.1), the density of 


Y is the function | 2 
1 x—m 
mo (Xx) = ex ; 3.2.16 
ọ o( ) /2n6 Pf 262 } ( ) 


In accordance with (1.3.9), the d.f. of X is equal to 


D(x “8 ay. 


1 x 
= e 
) V 2T L 
The function P(x) is called the standard normal d.f. The integral above cannot be computed 
analytically, but any particular value of ®(x) may be computed to a high degree of accuracy 
using the numerical integration technique. (See also Table 1 in Section 2 in Appendix.) 
Using again (2.6.1), we get that the d.f. of Y is the function 


Bync (X) -0(*="), (3.2.17) 


co) 


In Section 2.2.1.2, we show that the sum of independent (m,07)- and (m2,03)-normal 
rv.’s is (m +m, 07-+05)-normal. 

The importance of the normal distribution is explained, first of all, by the central limit 
theorem that we consider in Section 6.2. 

We present also the following well known and useful estimate for ®(x) (see, e.g., [38], 
[102],[116], [122]): for any x > 0 


x!(1—x*)@(x) < 1- ®(x) = ®(—x) < x7! @(a). (3.2.18) 


4. Moment Generating Functions 37 


4 MOMENT GENERATING FUNCTIONS 
4.1 Laplace transform 


The Laplace transform of a r.v. X and its distribution F is the function 
Mx (z) = E{e*} zi e“dF (x), (4.1.1) 


defined for all z for which the above expectation exists. In general, the argument z is a 
complex number. When it cannot cause confusion, we will omit the index X in My (z). 

If z is a real number, the Laplace transform is also called a moment generating function 
(m.g.f.). We will use the latter term since it is customary in actuarial modeling. The letter 
M comes from the word “moment”. 

The terminology chosen is related to the following fact. As above, denote by m, the 
kth moment E{X*}. Making use of the Taylor expansion of the exponential function (see 
(4.2.5) in Appendix), we can write 


k! A k Ak 


oo k o „k k oo 
mozne È er pk } = § “4 (4.1.2) 
(We omit the formal justification of passing the expectation operation inside the sum.) 
Thus, the m.g.f. My (z) may be expanded into the series in powers of z, where the kth 
coefficient equals the kth moment divided by k!. Thus, M® (0) =m x. We consider this 
issue in more detail in Sections 4.4-4.5. 
It is worth emphasizing that for real z’s, the integral in (4.1.1) may not exist. Therefore, 
m.g.f.’s exist not for all rv.’s and/or not for all values of z; see examples in Section 4.3 


below. Certainly for z = 0, we can always write that 
Mx (0) = E{e°*} = E{1} =1. (4.1.3) 


If z = it, where the imaginary i = \/—1, the Laplace transform is equal to 
Ky(t) = E{e"¥} = f oF (x), 


and is called the characteristic function of a r.v. X. The same function is called also the 
characteristic function of F, or the Fourier transform of F. Since |e"*| = 1, the character- 
istic function always exists. Indeed, for the last integral 


fare) < L. \e"*| dF (x) = f aro) =1, 


that is, the integral is finite for any z. 

If X assumes values 0,1,... , and py = P(X = k), it is convenient to consider real z < 0, 
and set s = e? which is not larger than one. Then E{e*} = E{s*}, where 0 < s < 1. 
Denoting the last function by Gy (s), we can write 


Gx(s) = E{s*}=s°pots'pits’prt...= $} pis. 
k=0 


38 0. PRELIMINARY FACTS 


The function Gx(s) is called a probability generating function (or simply a generating 
function). The coefficients in its expansion in powers of s are just the corresponding prob- 
abilities px. 


The Laplace transform in its different versions has proven to be a powerful weapon for 
solving a wide variety of problems. There are several reasons for this, but possibly the 
primary reason is connected with the following property: 


For any independent r.v.’s X; and Xo, 
Mx, +x, (z) = Mx, (z)Mx, (z) (4.1.4) 
for all z for which the Laplace transforms above are well defined. 


Indeed, by property (2.1.8), 
My, 4x, (2) = E{e)} = Efe e} = E{e™ E{e™} = My, (z)Mx,(2). 
Before considering examples, we establish two elementary and one non-elementary prop- 
erty. 
A. The Laplace transform of a linear transformation. 
For Y =a+bxX, the Laplace transform My (z) = e“ My (bz). (4.1.5) 
Indeed, My (z) = E{e@+))*} = Ef e@ebX) = e Ef eX} = e®My (bz). 


B. The Laplace transform of a mixture (or linear combination) of distributions is equal 
to the mixture (or the linear combination) of the Laplace transforms. More precisely, 
let F}, F be two distributions, and F = oF) + (1 — a)Fo, where 0 < a < 1 [see Sec- 
tion 1.3.5]. Let Mi(z), M2(z), M(z) be the Laplace transforms of the distributions 
Fi, Fo, F, respectively. Then 


M(z) = OM) (z) + (1 — a) Ma(z). 


This immediately follows from (4.1.1): 


M(z) = f ” dF (x) = f | &d (F(x) + (1-0) F(x) 


—oo 


=a / dF (x) +(1—@) f edFy(x). 

The non-elementary property mentioned is the uniqueness property: r.v.’s with different 
distributions have different Laplace transforms. The situation is similar when we consider 
only real z’s, that is, moment generating functions. However, in this case, we will always 
assume that the m.g.f.’s under consideration exist at least for all z such that |z| < c, where 
c is a positive constant. 


Theorem 8 R.v.’s with distinct distributions have distinct m.g.f.’s. 


In other words, if for two r.v.’s, X and Y, with the d.f.’s Fy (x) and Fy(x), respectively, 
Mx(z) = My (z) for all z in a neighborhood of zero, then Fy (x) = Fy (x) for all x. 


4. Moment Generating Functions 39 


4.2 An example when a m.g.f. does not exist 


Let a r.v. X have the density f(x) = 1/(2x°) for |x| > 1, and = 0 otherwise. The reader 
may verify that indeed f7 f(x)dx = 1. If the m.g.f. had existed, it would have been equal to 


ia 1 f7! 1 Lofe 
| Foax= sf. en ax | F) en ax. 


If z > 0, the first integral converges, because zx < 0, and hence e™ < 1. However, the 
second integral diverges, since for large x the function ers grows faster than, say, e, 
The proof may be found practically in any Calculus textbook; see, e.g., [136]. We omit the 
details. If z < 0, the second integral is finite, while the first diverges. So, the m.g.f. exists 


only for z = 0. 


4.3 The m.g.f.’s of basic distributions 


Next, we consider the m.g.f.’s of the basic distribution from Section 3. Table 2 in Ap- 
pendix, Section 1 presents a summary of the results below. 
In all formulas below, z is real. 


4.3.1 The binomial distribution 


The easiest way to find the m.g.f. of a binomial r.v. X is to use the representation 
X =X, +...4+Xn, 


where the r.v.’s X; are independent and equal 1 or O with probabilities p and q, respectively; 
see Section 3.1.1. The m.g.f. of each X; is 


My,(z) =e*'p+eq=ep+q=1+p(e*—1), 


since q = 1 — p. By property (4.1.4), the m.g.f. of X is the product of the m.g.f.’s of X;’s, 
so 


Mx(z) = (ep +q)" = (1+ple— 1))". 
For the corresponding generating function, setting s = e*, we have 


Gx(s) = (sp +q)". 


4.3.2 The geometric and negative binomial distributions 


For the geometric distribution in the form (3.1.9), 


co co p 
Mx(z) = X pé = p X (ea) = i (4.3.1) 
k=0 k=0 TAE 
(In the last step, we used formula (4.2.10) for a geometric series.) 
To get the m.g.f. for the rv. N = K + 1 (see Section 3.1.3), we use (4.1.5), which leads to 
Mn(z) = &Mx(z); see also Table 2 in Section 1 in Appendix. 


40 0. PRELIMINARY FACTS 


In general, for the negative binomial distribution (3.1.15), using expansion (4.2.9) from 
Appendix, we have 


= v+m—-1 Z (v+m—1 
meo = Y (Tn 1) pram =p È (YET) eo" 


m=0 


= p-e = ( E ) . (4.3.2) 


The generating function 
v 
p 
G E . 
Ko) = (75) 


To get the result for Ny = Ky +v, we again use (4.1.5), which leads to My, (z) = e*’Mx, (z). 


4.3.3 The Poisson distribution 
For the Poisson distribution with Pai Ni its m.g.f. 


œ k 
M2) =} eke at = oe ere — exp{A(e®— 1)}, (4.3.3) 
k=0 


k=0 


by virtue of the Taylor expansion (4.2.5) from Appendix. 


Next we consider continuous distributions. 


4.3.4 The uniform distribution 
For the distribution with the density (3.2.1), the m.g.f. 


Begal e% — er 
M(z) = dx = 4.3.4 
(2) [fe b-a” z(b—a) ( ) 

if z #0. For z = 0 the last expression is not defined, but we can set M(0) = 1 in view of 
(4.1.3). Note that the limit of the r.-h.s. of (4.3.4) as z — 0 equals one (one can use, for 
example, L’H6pital’s rule). 
4.3.5 The exponential and gamma distributions 

For a r.v. X, let the density f(x) = ae“ for x > 0. Then the m.g.f. 
a 1 


5 (4.3.5) 


M(z)= f eae “dx = 
0 
for z < a. It is important to emphasize that for z > a the m.g.f. does not exist. 


In general, for the density (3.2.11) and z < a, making the change of variable y = (a — z)x, 
we have 


— = ZX _ a ği zx N—1 „,—ax Jy — a” ? y- —(a—z)x 

m@= e fald = y f ex e da= gy, xT e dx 
E NS 8 a E E. a Nee ON I, 
“Teh? = TH aaah = (545) = Gaga 


again provided that z < a. 


4. Moment Generating Functions 41 
4.3.6 The normal distribution 
For a standard normal r.v. X, 


Mx(z) = / ee age ax = weal exp{zx — x7 /2}dx 
—o0 TU TU J —co 


= (27)! f > {3 (2°—(x—z)*) } dx = (2m) ~1/2¢"/2 a exp{ —5(-2)*} dx 


—oo 


(in the third step we completed the square). With the change of variable y = x — z, we have 
= 2 ms 2 j = 
Myx (z) = (20) e pp exp{—y"/2}dy =e pp (27) exp {—y°/2} dy. 


The integrand in the last integral is the standard normal density. Hence, the integral itself 
equals one, and 
Mx(z) = el, 


For the (m,o7)-normal r.v. Y = m + 0X, by virtue of (4.1.5), 
My (z) = e™ My (oz) = exp{mz +077" /2}. (4.3.6) 
4.4 The moment generating function and moments 
Consider a r.v. X with a d.f. F(x), and assume that for some co > 0, 
E{el*|} < œ for |z| < co. (4.4.1) 


Since E{e*} < E{el%!}, condition (4.4.1) ensures that the m.g.f. 
M(z) = E{e¥} = f ed F (x) 


is well defined for all z from the interval (—co,co). 

Clearly, M(0) = 1. 

It may be proved that under condition (4.4.1), we can differentiate M(z) an arbitrary 
number of times, and we can do that by passing the operation of differentiation through the 
integral. 

In particular, differentiating M(z) once, we get 


M'(z) = E{Xe*} = J dF (a). (4.4.2) 


Hence, 
M'(0) = E{X}. 


Differentiating (4.4.2), we have 


M" (z) = E{X°e*} = / ” 26d F(x), (4.4.3) 


42 0. PRELIMINARY FACTS 


and 
M” (0) = E{X?}. 


Continuing in the same fashion, we get that the kth derivative 
®(z) = E{X*e*} = a e“dF (x), (4.4.4) 


and for all k 
M (0) = my = E{X*}, (4.4.5) 
the kth moment of X. 


From (4.4.3) it follows, in particular, that M” (z) = E{Xe*} > 0 for all z, and hence 


M(z) is always convex. (4.4.6) 


> As a matter of fact, to show the convexity of M(z), we do not have to consider deriva- 
tives. Since e**, as a function of z, is convex, we can use the counterpart (for convexity) of 
Definition 1 from Section 4.3 in Appendix, and write that 


M(z1) +M(z2) = Efe" +e} > E {eve { ew }] 


afee) (242) 


which implies the convexity of M (z). < 


EXAMPLE 1. As was promised in Section 3.1.4, we compute the mean and the variance 
of the negative binomial r.v. Ky. By (4.3.2), the corresponding m.g.f. M (z) = [p/(1—qe*)]. 
Then 

M'(z) = p*vqe*/(1—qe*)"*", 
and 

& e~q(v +1) 
—qer)Vtl | (1 —qez)v*? 

Since q = 1 — p, we have E{Ky} = M' (0) = p’vq/(1—q)"*! =vq/p, E{K?} =M"(0) = 
p’vq|p-* '+q(v+1)p ale =vqp 7[p+q(v +1)]. This readily implies that Var{ Ky} = 
E{K2\ — (E{Ky})* = vqp *, so we have arrived at (3.1.16). 


M" = Vy 
(z) = p’vq (i 


4.5 Expansions for m.g.f.’s 
4.5.1 Taylor’s expansions for m.g.f.’s 


The reader may look over general facts on Taylor’s expansions in Section 4.2 in Ap- 
pendix. 

We also use below, for the first time in this book, the common Calculus notation o(x) 
(“little o” notation). Since it is not always used in introductory Calculus courses, we explain 


4. Moment Generating Functions 43 


its significance in two pages of a special Section 4.1 in Appendix. If the reader is not 
familiar with the o(-) notation, it is worth the time required to understand it anyhow. It is 
convenient, explicit, and saves time in a great many calculations. 

In short, o(x) is a function such that [o(x)/x] — 0 as x > 0. We view o(x) as a term 
negligible in comparison with x for small x’s. 

First, we state that under condition (4.4.1), the expansion 


me) = yk (4.5.1) 


is true, provided |z| < co. We skip the proof of this fact and the other formal operations 
below. 
Secondly, making use of the general Taylor formula (see (4.2.3) in Appendix) and (4.4.5), 
we have 
mzz’ mz 


M(z) = 1+mız+4 a ei al o(z"). (4.5.2) 


In particular, 
2 


M(z) = L+mz+" +o). (4.5.3) 


4.5.2 Cumulants 


Set K(z) =In(M(z)). If M(z) admits the Taylor expansion, so does K(z); we skip here 
formalities. Since K(0) =1n(M(0)) =0, the Taylor expansion for K (z) will look as follows: 


K(z) =In(M(z)) = y aid (4.5.4) 


i=l TE 
where 2’s are the corresponding coefficients, more precisely, the corresponding derivatives 
of K(z) at zero. The quantity 2; is called the ith cumulant of the r.v. X (and its distribu- 
tion). The significance of these characteristics and their usefulness is connected with the 
following instance. 
Let Mı (z) and M2(z) be the m.g.f.’s of independent r.v.’s X; and Xz, and M(z) be the 


m.g.f. of X; +X. Denote by Kı (z), K2(z), and K(z) the logarithms of the corresponding 
m.g.f.’s. Then taking the logarithms in (4.1.4), we get 


K(z) = Ki (z) + K2(z). (4.5.5) 


The last relation turns out to be convenient in many problems. Combining (4.5.4) and 
(4.5.5), we see that 
Hi = Hi + xin, (4.5.6) 


where xii, xi, and x; are the ith cumulants of X1, X2, and X; + X2, respectively. 


Let us now return to (4.5.4). As a simple example, consider the case n = 2. It will also 
be a good exercise on the use of the notation o(-). By virtue of (4.2.8) in Appendix and 


44 0. PRELIMINARY FACTS 


(4.5.3), for a r.v. X and its m.g.f. M(z), 


K(z) = In[M(z)] = In[1 + (M(z) — 1)] = (M(z) - 1) TMo 1)? +0 ((M(z) —1)’) 


2 2 2 2 2 
mz 1 mz Mz 
= mz ; + 0(z”) 5 (m+ = +0(2)) +o (G on +0(2)) 


2 2 
mz 1 m—m 
= mız + T +o(27) — 5 (m2) + 0(27)+0(22) = mz+ es +0(z’). 
Since m — mî = 0”, the variance of X, we have 
269) 
oz 
K(z)=mz+ —— +o0(z’), (4.5.7) 


where m = mı = E{X}. 
Comparing it with (4.5.4), for the first two cumulants we have 


xı =m, m =o’. 


In this case, rule (4.5.6) coincides with the corresponding rules for the mean and the 
variance of sums of r.v.’s. 
A good exercise is, using the same little-o-technique, to show that the third cumulant 


23 = u = E{(X —m)’}. 


5 CONVERGENCE OF RANDOM VARIABLES 
AND DISTRIBUTIONS 


Consider a sample space Q = {@}, a class of events A, a probability measure P(A) 
defined on events A from A, and r.v.’s X which were defined as functions X (œ) on Q. 
We say that a sequence of r.v.’s Xn = Xn(@) converges to a r.v. X = X (œ) almost surely, 
or with probability one, if 
P(X, 2 X)=1. 


More precisely, this means that the set of all œs for which X,(@) —> X (œ) has probability 
one. In this case, we write X, $X. 

In many models, such convergence either does not take place, or if it does, it is difficult 
to prove. On the other hand, in many applications, it is sufficient to consider a weaker type 
of convergence when we just require X,, — X to be small for large n with a probability close 
to one. We will now translate this heuristic definition into mathematical terms. 

Saying that X, — X is small, we mean that |X;, — X| is less than a sufficiently small positive 
number £. So, we want |X,, — X| to be less than £ for large n with a large probability. 

Saying that a probability is large, we mean that it is close to one, and saying “for large 
n” we mean that n — œ. We are ready to give a formal definition. 


5. Convergence of Random Variables 45 


F(x) 
1 


FIGURE 15. 


A sequence of r.v.’s X, converges to a r.v. X in probability if for any arbitrary small € > 0 
P(|X. -X|<€) 91 asn > œ. (5.1) 

In this case, we write X, S X. Note that (5.1) is equivalent to the relation 
P(|X, —X| > £) > 0 as n > œ. (5.2) 


It may be proved that convergence almost surely implies convergence in probability; see, 
e.g., [27], [129], [122, pp 95-100]. 


Later, we will also need the definition of convergence to infinity. Following the same 
logic, we say that Xn En œ, converges to infinity in probability, if for any arbitrary large 
number k > 0, 

P(X, >k) > 1 as n > œ. (5.3) 


We say that X, ki —o, converges to negative infinity in probability, if for any arbitrary 
large k > 0, 
P(X, < —k) > lasn— œ. (5.4) 


The next type of convergence is weaker and is connected with the proximity of distribu- 
tions. 

We say that a sequence of distributions F, weakly converges to a distribution F, writing 
it as F, “> F, or simply F, > F, if the d.f.’s 


F(x) + F (x) for all x at which the d.f. F(x) is continuous. 


We consider only points of continuity of F(x) because otherwise the above definition 
would not cover very natural types of convergence. 


EXAMPLE 1. Let F, be the distribution of a r.v. X,(@) = +, and let F be the distribution 
of a r.v. X (œ) = 0 (so each r.v. assumes only one value for all œs). The graphs of the d.f.’s 
F,(x) and F(x) are given in Fig.15. Clearly, X,(@) — X (@) for all œs. The reader can 
readily verify also that F,(x) + F(x) for any x £0. However, F,,(0) = 0 while F(0) = 1, 
so Fa (0) Æ (0). Since it would be very unnatural to claim in this case that the distribution 


46 0. PRELIMINARY FACTS 


F, does not converge to F, we should just exclude the point zero from consideration. Note 
that F (x) is not continuous at x = 0. 


Certainly, if the limiting distribution F(x) is continuous, we consider convergence for 
all x’s. It is worth emphasizing that this is the case for the central limit theorem where 
the limiting distribution is the normal distribution, which is continuous (see Section 6.2 for 
precise statements and detail). 


We say that r.v.’s X and Y are equal in distribution, X 4 Y, if their distributions are equal. 


; di Bieri’ a d ; eee 
We say that r.v.’s X, converge to X in distribution, writing X, — X, if the distributions 
Fy, — Fy. 


It may be proved that if X assumes only one value, the convergencies X, £ X and Xn IX 
are equivalent, but in general this is not true. (See details, e.g., in [122, Sec.7-3.1].) 

To understand the above statement, observe the simple fact that the equality of the distri- 
butions of r.v.’s does not imply the equality of the r.v.’s themselves. 


EXAMPLE 2. You play a game with a friend in which you pay $1 to your friend with 
probability 1/2 , and with the same probability your friend pays $1 to you (say, you toss a 
coin). Clearly, your gain is the r.v. X = 1 or —1 with equal probabilities 1/2. The gain of 
your friend is the r.v. Y = —1 or 1 with the same probabilities. The probability distributions 
of the r.v.’s X and Y are, clearly, the same: X 4 Y, but the r.v.’s themselves are not equal to 
each other since Y = —X. 

If we set now X, = Y, we will have Fy, = Fy (and hence Fy, — Fy) but the rv. Xn 
themselves does not converge to X. 


p> Weak convergence of distributions does not exhaust all interesting types of conver- 
gence. Consider distributions F,,(B) = P(X, € B) and F(B) = P(X € B), where sets B are 
subsets of the real line. (See also Section 1.3.1 for definitions and a remark concerning 
Borel sets.) When considering the convergence of d.f.’s, we deal with sets B = (0, x]. 

Note that the convergence of distribution functions may not imply the convergence for 
all sets. 


EXAMPLE 3. We divide the interval [0,1] into n equal parts, and set x4, = k/n, k = 
0,1,...,2—1. Let X, be a r.v. taking values x,, with equal probabilities 1/n, and X be a 
r.v. uniformly distributed on [0, 1]. For any x € [0, 1], the d.f. F,(x) = P(X, < x) = + x [the 
number of points xz, < x]. 

A good and not very difficult exercise is to prove that F, (x) — x, that is, F, (x) converges 
to the length of the interval [0,x], which, in turn, is the d.f. F(x) = P(X < x). Thus, F, “> F. 

Let now B be the set of all possible points xkn, where n = 1,2,..., and k =0,1,...,.2—1. 
Certainly, F,,(B) = 1, since all possible values of X, are in B. On the other hand, F (B) = 0. 
Indeed, the uniform distribution is continuous, so for each point x4, the probability P(X = 
Xkn) = 0. Since the number of all such points is countable, the probability P(X € B) is the 
sum of all these zero probabilities, that is, zero. 


So, if we want to have the convergence of distributions over all sets, we should require 
more from the model. Assume, for example, that r.v.’s X, and X have densities f,(x) and 


5. Limit Theorems 47 


f (x), respectively. Then for any set B, 


[O Pax 
< f \flx)ar—feolaxs f Adr- fald 65 


IF,(B) — F(B)| = | f idx- f fax 


Thus, in the case of continuous r.v.’s, the convergence over all sets is connected with the 
convergence of densities. In particular, for F;,(B) — F (B) for all B, it is sufficient that the 
last integral in (5.5) converges to zero. 

We restrict ourselves to a simple example. 


EXAMPLE 4. Let X, be uniformly distributed on [+,1+ +], and let X be uniform on 
[0,1]. We prove that in this case we have convergence over all sets. Indeed, f,(x) = 1 if 
x € [4,144], and = 0 otherwise, while f(x) = 1 if x € [0, 1], and = 0 for other x’s. Then 


1+1/n 2 
+f |1—O|dx = - 0 as n > œ. 
1 n 


From this and (5.5), it follows that F,(B) > F (B) for all B. O < 


6 LIMIT THEOREMS 
6.1 The Law of Large Numbers (LLN) 


Let X1, X2,... be a sequence of independent identically distributed (i.i.d.) r.v? s. Let S, = 
Xi +... +Xn, and X, = Sn /n. Set m = E{X;}, provided that it exists. It does not depend 
on i since X’s are identically distributed. The LLN says that though X,, is random for each 
particular n, this randomness vanishes as n gets larger, and X,, approaches m. The point 
here, however, is that since X, is not a sequence of numbers but of random variables, the 
very notion of convergence should be defined properly. We use the notions of convergence 
almost surely (with probability one) and in probability discussed in Section 5. 


Theorem 9 (The strong LLN) 
(a) Suppose that E{|X;|} is finite. Then X, $ m (that is, X„ converges to m almost 
surely). More specifically, 

P(X, +m) =1. (6.1) 


(b) If for some c 
P(X, 3c) =1, (6.2) 


then E{|X;|} is finite, and c = m. 


As was told in Section 5, almost surely convergence implies convergence in probability, 
so we state 


48 0. PRELIMINARY FACTS 


Corollary 10 (The weak LLN) Suppose that E{|X;\} is finite. Then X y *, m (that is, Xp 
converges to m in probability). More specifically, for any € > 0, 


P (|X, -m| >e) +0 as n—> o. (6.3) 


The LLN is a mathematical theorem but it may be viewed as a fundamental law of nature. 
First of all, due to this law, the random behavior of a great many of real processes in the 
long run exhibits a sort of stability. Consider, for instance, the consecutive values of daily 
income of a company in the long run. These values—denote them by Xj, X2,....—may be 
essentially random, uncertain. However, if the X’s are i.i.d., the average income per day 
for n days, that is, X„ = 1(Xı +... + Xn), for large n, in the long run, is practically certain, 
being close to the non-random value equal to the expected value of the X’s. 

Not of less importance is that the LLN allows us to estimate the mean values of r.v.’s. 
Suppose, for example, that we want to estimate the mean of the highest August temperature 
in a particular area. The only thing we can do is to review such a temperature, say, in the 
last 100 years and compute the average. Our intuition tells us that the average will be close 
to the mean value (though, perhaps, not exactly equal to), and the LLN confirms it and 
explains in which sense it is true. 

Proofs of the LLN and detailed discussions may be found in many textbooks on Proba- 
bility; see, e.g., [27], [38], [116], [120], [122], [129]. 


6.2 The Central Limit Theorem (CLT) 
Let E{X?} be finite, and o? = Var{X;}. Since the X’s are iid., we have E{S,,} = mn 
and Var{S,,} = 07n. Consider the normalized sum 


s Sn—E{Sn} _ Sn— mn 


”  NVarfSa} ovn ` 
It is worth emphasizing that the normalized r.v. S% is just the same sum S„ considered 


in an appropriate scale: after normalization, E{S*} = 0, and Var{S*} = 1 (see Example 
2.6-1). 


Theorem 11 (The CLT) For any x, 


P(S% <x) > ®(x) as n> œ, 
X 
where ®(x) = “= il ona *du, the standard normal distribution function. 
TU J —co 


Corollary 12 For any a and b, 


1 b 
Pass <b) > | e dx as no, 
a 


V2n 


In spite of its simple statement, the CLT deals with a deep and important fact. The 
theorem says that as the number of the terms in the sum S, is getting larger, the influence 


6. Conditional Expectations 49 


of separate terms is diminishing and the distribution of S„ is getting close to a standard 
distribution (namely, normal) regardless of which distribution the separate terms have. 
It may be continuous or discrete, or neither of these, uniform, exponential, binomial, or 
anything else—provided that the variance of the terms is finite, the distribution of the sum 
Spn for large n may be well approximated by a normal distribution. 

The theorem has an enormous number of applications because it allows one to estimate 
probabilities concerning sums of r.v.’s in situations where the distribution of the terms X’s 
is not known; more precisely, when one knows or has estimated only the most “rough” 
characteristics: the mean and standard deviation. 

The same theorem may be useful when formally the distribution of the separate terms is 
known, but it is difficult (if possible) to present a tractable representation for the distribution 
of the sum. 

Proofs, with use of different methods, and generalizations of this theorem may be found 
in many textbooks. Often, to make proofs more transparent, in textbooks some additional 
unnecessary conditions are imposed. Complete proofs with conditions close to necessary 
may be found, for example, in [27], [38], [120], [122], [129]. 


7 CONDITIONAL EXPECTATIONS. CONDITIONING 


Usually, standard courses in Probability do not pay much attention to conditional ex- 
pectations. For us this topic is quite important; so we consider it in detail and even give 
exercises on conditioning in Section 9. 


7.1 Conditional expectation given a r.v. 


Let X and Y be r.v.’s. Our immediate goal is to define the quantity which we will denote 
by E{Y |X =x}, and which we will understand exactly as it sounds: the mean value of Y 
given that X took on a particular value x. 


7.1.1 The discrete case 


Let X be a discrete r.v. or a r.vec. —we will see that it does not matter in our case—taking 
values x1,x2, .... Below we will often omit the index i of x;, writing just x but keeping in 
mind that we consider only x’s which coincide with one of the values of X. In particular, 
this means that for such x’s, P(X =x) £0. 

We define the conditional cumulative distribution function (or simply conditional d.f.) of 
Y given X = x as the function 


Fy(y|X =x) =P(Y <y|X =x) = (1.1.1) 


By analogy with the standard representation (2.1.5), let us set 


EY |X=x}= f ydFr(y|X =»). (7.1.2) 


50 0. PRELIMINARY FACTS 


For E{Y |X = x} so defined, we will also use the notation myy (x). The function my, (x) 
is often called a regression function of Y on X. When it does not cause misunderstanding, 
we omit the index Y|X in myx (x). 


If Y is also discrete and takes on values yj,y2,... , then the definition (7.1.2) may be 
written as 
E{Y |X =x} =} y;P(Y =y;|X =x). (7.1.3) 
j 
Ya 


IQ— with probability 1/2 


EXAMPLE 1. Let a r.vec. (X,Y) take on vector-values 
with probability 1/3 (0,1), (1,0), (0,—1) with probabilities 5 L, H, respectively; 
a eee see also Fig.16. If X = 1, then Y takes on only one value 0, 
' so P(Y =0|X = 1) = 1, and 


-16— with probability 1/6 


m(1) =E{Y¥|X =1} =0. 


FIGURE 16. In accordance with (7.1.3), 


m(0) = E{Y |X =0} =r = 1|X =0)+0-P(Y =0|X =0)+4+(-1)-P(Y = —-1|X =0) 
os 1/2 oe l 1/6 `i 
sioa ea — o 


EXAMPLE 2 is classical. Let Nj and M, be independent Poisson r.v.’s with parameters 
A, and Ap, respectively. Find E{N; | Nj +M2 =n}. 

Thus, Nj plays the role of Y, and N; +MN2 plays the role of X; we also replaced x by n. For 
example, there are two groups of clients of an insurance company, and N; is the number of 
claims coming from the ith group. We are interested in the mean number of claims coming 
from the first group given the total number of claims. 

It is known that the conditional distribution of N; given N is binomial. More precisely, 


n = 
P(N, = EIN +82 =n) = (p) oA py 


where 


c= A +A 
We give a formal proof of this fact in Section 3.3.1.2, where we consider this phenomenon 


in detail. Now, to give an example on conditional expectation, it is enough to take this fact 
for granted. 


Since the mean value of the binomial distribution is np, 


na 
A +A 


E{N,|Ni +N2 =n} = 


(7.1.4) 


Let us return to the regression function m(x) = my|x(x). This is the mean value of Y 
given X =x. Since X is a random variable, its values x may be different, random. To 
reflect this circumstance, let us replace in m(x) the argument x by the r.v. X itself, that is, 


7. Conditional Expectations 51 


consider the r.v. m(X). This is a function of X, and its significance is the same as above: 
it is the conditional mean value of Y given a value of X. However, since X is random, the 
conditional mean value of Y given X is also random, and the value of m(X) depends on 
which value X will assume. 

The r.v. m(X) has a special notation: E{Y |X}, and is called the conditional expectation 
of Y given X. It is important to keep in mind that this is a r.v., and since E{Y|X} is a 
function of X, its value is completely determined by the value of X. 


EXAMPLE 3. In the situation of Example 1, X takes on two values: O and 1 with 
probabilities 5 and i, respectively; see also Fig.16. Hence, m(X) takes on the values 
m(0) = 4 and m(1) = 0 with the above probabilities. Consequently, E{Y |X} is a rv. 


taking values 5 and 0 with respective probabilities A and i. 
EXAMPLE 4. Let us return to Example 2 and set N = Ni +N2,A =A, +2. By virtue 


À 
of (7.1.4), my, į (n) = Zl n, and hence 


A 


À 
E{N,|N} = N. 


7.1.2 The case of continuous distributions 


If the denominator in (7.1.1) equals zero, we cannot write this representation as is. For 
example, this is the case for all x’s in the continuous case. So, we should define Fy (y |X = x) 
and E{Y|X = x} in another way. In this section, we assume that the vector (X,Y) has a 
joint density f(x,y). Denote by fx (x) the marginal density of X and consider the function 


f(x,y) 
Sx (x) 
provided fx (x) 4 0. If fx (x) = 0, we set fy|x(y|x) = 0 by definition. 

We call fy)x(y|x) the conditional density of Y given X = x. When it cannot cause 
confusion, we omit the index Y |X in fy|x(y|.). 

To clarify the significance of definition (7.1.5), let us consider infinitesimally small in- 
tervals [x,x + dx], [y,y-+dy] and assume fx (x) 4 0. Reasoning somewhat heuristically, we 
represent the probability P(x < X < x+ dx) as fy(x)dx; see also Section 1.3.1. Similarly, 
we write P(y < Y <y+dy,x < X < x+dx) = f(x,y)dxdy. Then 


P(y<Y <y+dy,x<X <x+dx) 
P(x <X <x+dx) 


fO |x)dy. (7.1.6) 


fy|x(y|x) = (7.1.5) 


Ply <Y <y+tdy|x<X <x+dx)= 


_ fe.y)dxdy _ 
fx (x)dx 
Since the interval [x,x + dx] is infinitesimally small, we can view P(y < Y <y+dy|x < 
X <x+dkx) as P(y < Y <y+dy|X =x), which leads to 


Ply <¥ <y+dy|X =x) = f(y|x)dy. 


Thus, f(y|x) indeed plays the role of the density of Y when X assumes a value x, which 
justifies the formal definition (7.1.5). Note that the reasoning above is just a clarification. 
Formally, we adopt (7.1.5) as a definition. 


52 0. PRELIMINARY FACTS 


Now, again as a definition, we set the conditional d.f. 
y 
FryiX=x)= f felad, 1.1.7) 
and, by analogy with (7.1.2), 
m(x) = mx) EX =x} = f yf(y|x)dy. 7.1.8) 


Note also that by virtue of (7.1.7), dFy (y |X = x) = f(y|x)dy (the conditional density is 
the derivative of the conditional d.f.). Hence, we can write (7.1.8) as 


m(x) = my x(x) = E{Y |X =x} = f” ydF(y|x)dy. (7.1.9) 


Comparing it with (7.1.2), we see that the latter representation is true for the discrete and 
continuous cases as well. 


Now, as above, we define the expected value of Y given X as 


E{Y |X} =m(X). 


EXAMPLE 1. Let a vector Z = (Z,,Z2) be uniformly 
distributed on the unit disk O; = eat +z% < 1}. The joint 
density of Z is f(z1,z2) = 1/7 if (z1,z2) € O1, and = 0 oth- 
erwise; see also Fig.17. 

First, we find fz, |z (z2|z1), so the role of X above is 
played by Z4, and the role of Y by 22. 

If Zi = z, then values of Z2 lie on the interval 


FIGURE 17. j- /1—zt,4/1 =r see again Fig.17. Since the joint 


distribution is uniform, we can guess that the same is true 
for the conditional distribution of Z2 given Z1. Let us show it rigorously. 


For the marginal density of Z1, we have 
viä | 24/1—zy 
= (7.1.10) 


fz nt = 
1 


filzı) = f f@u2)ae = 


2= 


Then, by definition, 


Z1,Z 1 ; ; 
faial) =E G : = if z2 € | 4/1 f.y/1— a) ana = 0 otherwise. 
1&1 2 /1—z; 


For a fixed z1, as a function of z2, the last density is constant. Hence, the conditional 


distribution under consideration is indeed uniform on j- /1— zi, 4/1— ål ; 


7. Conditional Expectations 53 


Let us consider conditional expectations, say, E{Z2|Z,} and E{Z; |Z, }. We can use the 
general formula (7.1.8), but it is not necessary. 

Indeed, for a r.v. € uniformly distributed on [—a,a] for some a, we have E{&} = 0 and 
E{&\ = a?/3; see Section 3.2.1. As we now know, given Z; the rv. Z) is uniformly 
distributed on [—(1 — Z7)!/?, (1 — Z2) !/?]. Hence, E {Z3 | Z1 } = 0 (which could be predicted 
by using the symmetry argument), and 


E{Z3|Zi} = (1—Z})/3. 


EXAMPLE 2. Let X1,X2 be independent r.v.’s with densities f(x), fo(x), respectively, 
and let S = X; + X2. In many particular problems, it is important to realize that 


Afls =x) 


fx, |s(x|5) = f(s) 


(7.1.11) 


Since the denominator is exactly what the definition (7.1.5) requires, to prove (7.1.11) it 
suffices to show that the joint density of the vector (X1,S) is 


Fs) = filx)fo(s—x). (7.1.12) 


Heuristically, it is almost obvious. For any r.v. X, its density f(x) is connected with the 
probability that X will take on a value close to x. In our case, if X; ~ x and S ~ s, then 
X2 =~ s—x. Since X;,X> are independent, P(X; ~ x, S ~ s) = P(X, x x, X% x s- x)= 
P(X, ~ x)P(X2 ~ s — x). This is reflected in (7.1.12). A rigorous proof may be found, e.g., 
in [122, p.197]. 


EXAMPLE 3. We apply the result of the previous example to the particular case when 
Xı and X3 are exponential with the same parameter a. It suffices to consider fy, | s(x|s) for 
x € [0,5]. Indeed, X; and Xz are positive, and given their sum equals s, both terms are not 
greater than s. 

Later, in Proposition 4 from Section 2.2.1.2, we will prove that fs(s) = fa2(s), where 
faz is the T -density with parameters (a,2). So, fs(s) = a*se~®, and by (7.1.11), for x < s 


aew"*qe7Us-*) 1 


fx, |s(|8) = = 


a? se s` 


Thus, the conditional distribution is uniform on [0,s]. The result is nice: though the values 
of the exponential r.v. Xj are not equally likely, given the information that the sum Xj + X2 
is s, the r.v. X; may take on any value from [0,s] with equal likelihood. This is strongly 
connected with the fact that X’s are exponential; see also Exercise 4. 

Once we know the conditional distribution, we can compute various expectations. In 
particular, E{X, |S} = S/2 (as the mean of the distribution uniform on (0, S]). If we define 
the conditional variance as the variance of the corresponding conditional distribution, we 
can write that Var{X; |S} = S?/12. 


54 0. PRELIMINARY FACTS 


7.2 Properties of conditional expectations 


1. Let us recall that the conditional expectation E{Y |X} is a r.v. The main and ex- 
tremely important property of conditional expectation is that for any X and for any Y 


with a finite E{Y}, 
E{E{Y |X}} = E{Y}. (7.2.1) 


Thus, 


If we “condition Y on X” and then compute the expected value of the conditional 
expectation, we come back to the original unconditional expectation E {Y }. 


We call (7.2.1) the formula for total expectation or the law of total expectation. In the 
next section, we consider some examples of applications of this formula, and later 
on we will apply it repeatedly. 


The validity of (7.2.1) is based on the fact that the part in the left member of (7.2.1) that 
concerns X cancels. To demonstrate it, consider the continuous case. Since E{Y |X} = 
m(X), by virtue of (7.1.5) and (7.1.8), we have 


ELEL [X}} = Efm(x)} = f mod f (f vroa) roas 
= an »( & = Jdydx =f ie yf (x,y)dydx 
za oe ay= f” yfrO)dy=E{)}, 


where fy (y) is the marginal density of Y. 
The proof in the discrete case is similar; we should just replace integrals by the corre- 


sponding sums. 
The next properties are straightforward and quite plausible from a heuristic point of view. 


We omit proofs. 


2. For any number c and r.v.’s X, Y,Y,, and Yo, 

E{cY |X} =cE{Y|X} and E{Y,+¥%)|X}=E{Y,|X}+E{%|X}. (1.2.2) 
3. Ifr.v.’s X and Y are independent, then E{Y |X } is not random and equals E{Y}. 
4. Consider Y = g(X)Z, where Z is a r.v. and g(x) is a function. Then 


E{g(X)Z|X} = 9(X)E{Z|X}. (7.2.3) 


7. Conditional Expectations 55 


In particular, 
E{g(X)|X} =8(X), (7.2.4) 


and E{X |X} =X. Intuitively, it is quite understandable. When conditioning on X, 
we view X as a constant, and hence g(X) may be brought outside of the conditional 
expectation. 


EXAMPLE 1. (The “beta” of a security). In Finance, the “beta” (B) of a security (say, 
a stock or a portfolio of stocks) is a characteristic describing the relation of the security’s 
return (that is, the income per $1 investment) with that of the financial market as a whole. 
In other words, B shows how the return of a security depends on the situation in the market 
on the average. 

Let Y be the (future) return of a security, and let X be the future value of a market index, 
i.e., a global characteristic monitoring either the value of the market as a whole, or an 
essential part of it. (Typical examples are Dow Jones or S&P indices.) 

Let us adopt the following simple model that sometimes works well: Y = EX +£, where 
coefficients € and € are random but do not depend on X. 

Loosely put, € characterizes the “random factors” that are relevant to the security and are 
not associated with the market. The random coefficient € reflects the impact of the global 
market situation on the value of the security. 

In view of Properties 2-4, E{Y |X} =E{EX+e|X} =E{EX |X} +E{e|X} =X E{E]|X}+ 
Efe|X}=XE{E} +E{e}. 

The mean value E {&} is denoted by B and called “beta.” We set a = E {e}, and eventually 
write E{Y |X} = BX +a. 

Note that B may be negative. For example, as a rule, the price of gold is growing when 
the market is dropping. 

By definition, the market itself has a beta of 1. Indeed, in this case Y = X, and hence 
6=1,6=0: 

If the variation of the stock return is larger on the average than that of the market, the 
absolute value of beta is greater than 1, whereas for a stock whose return varies to a less 
extent than the market’s return, |B| < 1. The reader may find particular values of B for many 
stocks in Web sites concerning the stock market. 


EXAMPLE 2. Let X1, X2 be i.i.d. rv.’s. Find E {X, |X; + X2}. (What is the mean value of 
one term (addend) given the value of the sum?) By the symmetry argument, we can guess 
that the answer is simple: 


1 
E{X\|Xi+X2}= zı +X2). 


This is, indeed, true, and may be shown as follows. Since X1, X2 are independent and have 
the same distribution, E{X;1 |X, +X2} = E{X2|X; +X2} by symmetry. Then, using the 
properties above, we have 2E {X1 | X1 +X2} = E{X1 |X +X2}+E{X2|Xi+X2} = E{X + 
X |X) +X} =X, +X. 


56 0. PRELIMINARY FACTS 


7.3 Conditioning and some useful formulas 


The formula for total expectation 
E{Y}=E{E{Y|x}} (7.3.1) 


proves to be a very useful tool in many problems including those where the original setup 
does not involve any conditioning. 

Formulas below present some modifications or particular versions of (7.3.1), which we 
will repeatedly use throughout the whole book. 


7.3.1 <A formula for variance 


Let us consider a counterpart of (7.3.1) for variances. We show that 
Var{Y} = E{Var{Y |X}}+Var{E{Y |X}}, (7.3.2) 


where Var{Y |X } is the variance of Y with respect to the conditional distribution of Y given 
X. In particular, we can write that Var{Y |X} = E{Y? |X} — (E{Y |X})?. 

To memorize (7.3.2), notice that in this formula the order of the operations E{-} and 
Var{-} alternates. To prove (7.3.2), we make use of (7.3.1) in the following way: 


Var{Y} = E{Y°} — (E{Y})° = E{E{Y*|X}} — (E{E{Y |X}? 
= E{E{Y?|X}—(E{¥ |X}? } +E {(E{Y |X}? } — (E{E{Y |X}? 
= E {Var {Y |X}} + Var {E{Y |X}}. 


We skip here particular examples; in the following chapters, we use (7.3.2) repeatedly. 


7.3.2 More detailed representations of the formula for total expectation 


We use representation (7.1.9) which is true for both continuous and discrete cases. Since 
E{Y |X} = m(X), from (7.3.1) and (7.1.9) it follows that 


EY EGA he fm dir= S (viro =») areca), (7.3.3) 


In the case when all densities exist, we can rewrite it as 
E= f moroa f (f rola) aoa aa 


Let us now recall that all r.v.’s we consider are defined on some sample space Q = {0}, 
i.e., each r.v. X is a function X (œ). Consider an event A C Q, and the indicator of A, that 


is, the r.v. 
lif MEA, 


Thus, 14 assumes the value 1 if A occurs, and the value 0 otherwise. Clearly, E{14} = 
1- P(A) +0- P(A), so 
E{1,} = P(A). (7.3.6) 


7. Conditional Expectations 57 


By analogy with (7.3.6), we define the quantity P(A |X) = E{14 |X}. The conditional 
probability P(A |X) should be understood as it is written: this is the probability of A given 
the r.v. X. Note that P(A |X) is a function of X, and it is a random variable. 

Setting Y = 14 in (7.3.1), we have P(A) = E{14} = E{E{14|X}} =E{P(A|X)}. Thus, 


P(A) = E{P(A|X}}. (7.3.7) 


In this particular case, m(x) = P(A |X = x), and by virtue of (7.3.3), we have 
P(A) = f P(A|X = x)dFx (x). (1.3.8) 


We can view it as a version of the formula for total probability. 
If X has a density fy (x), the last relation may be rewritten as 


P(A) = ig “PAIX = 3) f (ode. (7.3.9) 


EXAMPLE 1 is classical. Let €,,€ be independent exponential r.v.’s with respective 
parameters a1,a2. Find P(€2 > &1). A particular illustration is considered in Exercise 9. 
By (7.3.9) and by virtue of the independence condition, 


P>) = | P> Eli =a) fe, d= f PE > 2181 =x)ae as 


co [ee] a 
= f P(&2 > xjaje “dx = f e aye “dx =a, f e (Uta) dy — : 
0 0 0 


a, +a2 


Let us return to (7.3.8) and set A = {Z < z}, where Z is a r.v. and z is a number. After 
such a substitution, we get that the d.f. 


Fz(z) =P(Z<z)= [Pe <z|X =x)dFx(x) = [RZE =x)dFx(x), (7.3.10) 


where Fz(z|X = x) is the conditional d.f. of Z given X =x. If all densities exist, differenti- 
ating (7.3.10) in z, we obtain that the density of Z is 


O= f faxed (73.11) 


See also Exercise 8. 


EXAMPLE 2. Find the distribution of the ratio Z = €)/& for independent standard 
normal r.v.’s €; and &2. Certainly, Z is well defined only if &2 Æ 0, but since P(Ez = 0) = 0, 
we can eliminate this case from consideration. 

Consider the conditional density fz)¢,(z|x). We will omit the index Z|», writing just 
f(z|x). Because &; and > are independent, the density of Z given €) = x Æ 0 is the density 
of the r.v. &)/x. If x > 0, this is a normal r.v. with zero mean and variance 1/x?. Further- 
more, a normal r.v. with zero mean is symmetric and when multiplying it by —1, we do not 


58 0. PRELIMINARY FACTS 


change its distribution. Hence, for x < 0, we again have a normal r.v. with zero mean and 
variance 1 /x?. Thus, in both cases, f (z|x) = (2m)~!/?|x|exp{—z?x"/2}. 
Now, by (7.3.11), 


fa(2) = iE Sale (| 2) fe (x)dx = / ” (2r) "Pkl exp{ -2x /2} (20) * exp{—x7/2}dx 
= (27)! L Ix|exp{—x2(1 +2) /2}ax. 


The integrand is an even function, so fọ = fe By the variable change y = xv 1 +2?, 
we have 


Gee =a? | vadas Som fa ( exp{—y?/2}) 


=o ae 
E 1 
n(1 +z?) 


The distribution we have arrived at is called the Cauchy distribution. 


7.4 Conditional expectation given a random vector 


We will see that in the multidimensional case, we can proceed practically in the same 
fashion. 


7.4.1 General definitions 


Let X = (X1, ..., Xx) be a r.vec., and Y be a r.v. Set x = (x1,...,x%). If the distribution of 
X is discrete and x is a possible value of X, then P(X = x) 4 0, and we can again consider 
the conditional d.f. 


P(Y <y, X =x) 


FOIX =a) = PW <y|K =a) = “Se 


Then we can again set 
E{Y|X =x} =) ydFy (y|X =x) 


and proceed exactly as we did in Section 7.1.1. 

Let the r.vec. (X,Y) = (X1,...,Xn,Y) be continuous, let f(x,y) be its joint density, and 
fx(x) be the joint density of the r.vec. X. With respect to (X,Y), the density fx(x) is 
marginal. By analogy with what we did in Section 7.1.2, we define the conditional density 


(=o) pene 


provided fx (x) #0. Then, exactly as in (7.1.8), we define 


m(x) =f yfrlx)ay, 


and set E{Y |X} = m(X). 


7. Conditional Expectations 59 


EXAMPLE 1 is similar to Example 7.1.2-1. Let a vector Z = (Z),Z2,Z3) be uniformly 
distributed in the unit ball O; = EA +z +23 < 1}. We will condition Z3 on (Z;,Z2). 

Set z = (z1,22,z3). The joint density of Z is f(z) = 2 for z € O1, and =0 otherwise. (For 
the total integral of f(z) to be one, the density in this case should be equal to one divided 
by the volume of the ball.) If the r.vec. (Z;,Z2) assumes a value (z;,z2), then values of Z3 


lie in the interval -v 1—74- z, Vv l-zg- à| . The density of (Z1, Z2) is marginal with 


respect to the total density f(z). So, 


= V1-71-3 3 3,/1—zt—z i 
fiz z)(z1,22) f _Fenzaz)dz pee: mi? on (7.4.1) 


Then 
FizisOyza |. 1 


£3 |Z1;72) = 7 
Íz | (Z1,Z) (23 121,22) fizz) (21522) 1-2-2 


if z3 € | -3-3,)/1 —z =z] , and = 0 otherwise. 


As a function of z3, the last density is constant, so the conditional distribution under 


consideration is uniform on |- yl =z =z, yl -z1 — à| . 


Now, similar to what we did in Example 7.1.2-1, we can get that E{Z3|Z1,Z.} = 0, and 
E{Z5 |Zı, Z2} = (1 — Z? —Z3)/3. 


Proposition 13 The conditional expectation E{Y |X} defined above satisfies Properties 
1-4 from Section 7.2. 


7.4.2 On conditioning in the multi-dimensional case 


In this case, the formulas from Section 7.3.2 continue to be true with the natural replace- 
ment X by X, and x by x. In particular, we can write the following counterpart of (7.3.11): 


feo) = ff fayx (cla) fxla)dx, 142) 


where integration in the multidimensional integral above is carried out over all values of 
the vector x. 


EXAMPLE 1. This nice example is a generalization of Example 7.3.2-2. Consider the 
system of linear equations 
AZ =b, (7.4.3) 


where A = {aj;} is an x n matrix, b = (b1,...,b,) is a n-dimensional vector, and the n- 
dimensional vector Z = (Z1, ...,Zn) is the vector of unknowns, that is, Z is a solution to 
(7.4.3). 

Assume that all a;; and b; are independent standard normal r.v.’s. We prove that in this 
case, as in Example 7.3.2-2, each Z; has the Cauchy distribution. 


60 0. PRELIMINARY FACTS 


In view of symmetry, it suffices to consider Z1. Let A; be the cofactor of the element a; ;. 
By the well known formula for solutions of the system of linear equations, 
bj A, +b2A2 +... + bnAn 
ayjA, +a72A2+...+a1nAn 


Since all r.v.’s aj; and (A1,...,An) have continuous distributions, the probability that the 
denominator will be equal to zero is zero, so we may exclude this case from consideration. 


For the same reason, we can divide the numerator and the denominator by 4/A? +... +A2 


and write 
(biA1 +b2A2 +... + DpAn) / 4/4} +... +A2 
Z= , (1.4.4) 
(a141 +anA2 +... +a1nAn)/4/4f +... +A? 


Consider the conditional distribution of Z; given the vector A= (A1,...,;An). Note that the 
b’s and aj ;’s are independent of each other and do not depend on A. Then, once (A1,.--;An) 
is given, the numerator and the denominator in (7.4.4) are independent. 

Furthermore, given A, the r.v. b}A; +b2A2+...+b,A, is the sum of independent normal 
r.v.’s. Hence, this sum is normal. Its mean is zero (since each aj; has zero mean), and given 
(A1, ... An), the conditional variance of this sum is Aj +...+A?. Consequently, dividing by 


\/Aj +...+A2 we normalize the sum (see also Section 2.6), making its variance equal 


to one. Thus, given A = (A1, ...,An), the numerator in (7.4.4) is a standard normal r.v. 

By the same argument, the denominator in (7.4.4) is also standard normal. Thus, given A, 
the conditional distribution of Z; is the distribution of the ratio of two independent standard 
normal r.v.’s. By the result of Example 7.3.2-2, this is the Cauchy distribution. 

So, we have arrived at a remarkable fact. The conditional distribution of Z; given A does 
not depend on A at all and is equal to the Cauchy distribution. Then the unconditional 
distribution of Z; must also be equal to the same Cauchy distribution. Formally, we can 
prove it as follows. By virtue of (7.4.2), 


fa, (% = f faal x|u) fz(u)du 


where u = (u1,...,Un), and integration is over all possible values of u. As was proved, 
faja lu) = 1/0 +x°)]. Consequently, 


a= f ay Adur aodu- 


because the integral of any density equals one. 


7.4.3 On the infinite-dimensional case 


The situation becomes more complicated when X is an infinite-dimensional vector. For 
example, in Chapters 4 and 5, we study random processes X; where time ¢ is continuous. 
In many problems, we need to consider the conditional expectations of a r.v. Y given a 
realization of a process X; until time T, which may be written as E{Y|X;,0 <t < T}. 
Note that the “conditioning part” involves all X, for t € [0,T]. 


7. The Theory of Interest 61 


The significance of such a conditional expectation is absolutely the same as above, and 
it may be treated as above, including Properties 1-4 from Section 7.2. However, in a math- 
ematically rigorous exposition, we cannot proceed in the same way as before. One reason 


is that the notion of a density f(x) presupposes that f f(x) = 1, where integration is over 


all x’s. Thus, we must define what integration in an infinite-dimensional space means. This 
requires additional constructions. 

In this case, practically the only way out is to appeal to a general definition of condi- 
tional expectation based on the measure theory. However, it is important to realize that for 
applications, we need only two things: 


— the very fact that the conditional expectation exists, and 
— that it satisfies the main properties 1-4 from Section 7.2. 


If we take these facts for granted, then there will be no problem in using the notion of 
conditional expectation in applied models. 


8 ELEMENTS OF THE THEORY OF INTEREST 


In all examples below we choose, for certainty, a year as the fundamental unit of time. 


8.1 Compound interest 


We begin with the notion of rate. Let us view a variable C; as the capital of an investor 
at time t. For an interval |t,t +h], set AC; = C;+n — Cr, the absolute change of C; over this 
interval. If the relative change 

AC; 
C: 
we call k the rate of change over this interval; more precisely, the average rate. 

Now, consider the case when C; changes only m times a year at regular moments in time 
at an average constant rate 6. Then, we may divide each year into intervals of length t, 
and in accordance with (8.1.1), for each such interval we can write 


— kh, (8.1.1) 


ae =0 S (8.1.2) 
C; m 
It is noteworthy that, while we consider intervals whose lengths are less than one year, the 
coefficient 5 in (8.1.2) continues to be a rate per unit of time; that is, an annual rate. 
Assume, for example, that we deposit an amount Co into a bank account, and the bank 
credits (or compounds) interest monthly proceeding from an annual rate 5 equal, say, to 
5%. The bank will carry out crediting or compounding in the following way. 
The bank will divide the rate 5 by 12, and at the end of the first month the interest CÈ 
will be credited. This corresponds to the general rule (8.1.2) except for the fact that months 
have slightly different durations. 


62 0. PRELIMINARY FACTS 


So, the initial capital Co increases to Co(1+6/12). Next month, this amount will again be 
multiplied by (1 +6/12), the total amount will equal Co(1 +8/12)?, and so on, up to the end 
of the year when the final amount will be equal to Co(1 +6/12)!* = Co(1+0.05/12)!? = 
Co- 1.0512. We see that the real annual profit per one unit of money will be equal to 
i = (1+6/12)!? — 1 0.0512 = 5.12%, which is a bit larger than the rate 5%. 

If, keeping the same annual rate 6, the bank compounds interest not twelve but times a 
year, it will lead to the amount Co(1+6/n)”. 

The quantity i = (1+6/n)" — 1 is called an (annual) effective interest rate and shows the 
real growth over a year. Another term for the same quantity is yield. The reader can find 
both words, rate and yield, in her/his bank statement and compare the numbers there. 

It is noteworthy that when changing the number of the moments of compounding (or the 
number of conversion periods), we maintain the same rate 6. In Section 8.2, we will come 
back to this issue and consider the case when i is fixed while the rate is changing. 


As is well known from Calculus, the function (1 +6/n)" is increasing in n. So, the yield 
is larger than the interest rate, and the more often the interest is compounded, the better for 
the investor. 

Furthermore, (1 +6/n)" — e° as n —> œ. (The reader remembers that e is defined as 
limy.0(1 + 1/n)".) If n = œ, we say that the interest is compounded continuously. The 
annual growth factor in this case is e°, the effective interest (or yield) is i = eè — 1, and at 
the end of the year, the capital invested becomes equal to C4 = Coe®. 

In the case of continuous compounding, we call the characteristic 6 also the force of 
interest. The interest rate and the force of interest are synonyms in this case. 

At the end of t years, the capital will become equal to C, = Coe®. One may show this 
reasoning as above, but a better and more general way is to proceed as follows. 

Let us come back to the relation (8.1.1) and consider an infinitesimally small interval 
[t,t + dt]. Then (8.1.1) may be written as 


dC; 

C 

This may be viewed as a differential equation for C;. The reader remembers (or can 
readily verify) that a solution is 


5 


= ddt. (8.1.3) 


C = Coe”, (8.1.4) 


where Co is the value of the function C; at t = 0. 


From (8.1.4) it again follows that the annual interest rate Ci =C 


Co 


is equal to 
i=e-—1. (8.1.5) 
Making use of the differential equation (8.1.3) allows to consider the case when the rate 


6 = 8(r) is changing in time, which is closer to reality. Replacing in (8.1.3) the letter t by 


s, we can write it as 
dC, 


Cs 


ô(s)ds. (8.1.6) 


i Ss 
Since 


= d(InC,), integrating both sides of (8.1.6) from 0 to t, we come to the relation 


RY 


t 
Inc, Inco = f d(s)ds. 
0 


8. The Theory of Interest 63 


From this it follows that 


C; = Coexp TESS ; (8.1.7) 
0 


If 5(s) equals a constant 6, we come back to (8.1.4). 


EXAMPLE 1. Let the rate (s) increase linearly during a year from 4% to 6%. Find the 
annual effective interest (or yield). In this case, 5(s) = 0.04 + 0.02s for 0 < s < 1. Then 
fo 8(s)ds = fy (0.04 + 0.02s)ds = 0.05. So, the yield equals 


1 
exp {ff 5(s)ds} =1=¢°.—1 x 0.0512. (8.1.8) 
0 


Assume now that the interest is growing within the same limits, but the “speed” of this 
growth is not a constant, and is slower in the beginning. For example, 5(s) = 0.04 + 
0.02s?. Then, as is easy to compute, j ô(s)ds = 0.0466... , and the yield is equal to 
exp{0.0466...} — 1 ~ 0.0477 [(compare with (8.1.8)]. 


In general, the varying rate 6(s) depends on the market and, consequently, is random. In 
this case, 6(s) is a random process, and C; is a random process too. 

An interesting question is whether we underestimate or overestimate the expected income 
E{C,} if we replace the random rate ô by its mean value E {5}. 

The answer follows from Jensen’s inequality which we will prove and discuss in detail 
in Section 1.3.4.2. In particular, from this inequality it follows that if u(x) is a convex 
function, than for any r.v. X 


E{u(X)} > u(E{X}) 


(provided that all quantities above are finite). Since e* is a convex function, 


E{CG}=QE {exp {f 5(s)ds} } > Coexp fe {f 5(s)ds} } —Cyexp { f e¢8(s)}as} 


Thus, we underestimate the mean value of the yield replacing ò by E{65}. More precisely, 
since the annual yield is (C1 /Co) — 1, the expected annual yield is 


efa} exp [£(8(s)}a5} -1 


EXAMPLE 2. Assume that for a “good” year, the annual rate is 8%, while for a “bad” 
year, the rate is 2%. Suppose that both scenarios are equally likely. In this case, the r.v. 
5(s) does not depend on s, and if Cp = 1, then 


1 1 
E{C\}= uae + so ~ 1.0517, while et = ¢° = 1.0513. 


The difference is not large but may be significant for large investments. 


64 0. PRELIMINARY FACTS 


8.2 Nominal rate 


Next, we look at the same phenomenon from another point of view. Assume that the 
effective annual interest rate i is fixed, and the bank compounds interest m times a year. 
With what annual rate should the bank compound interest in order to maintain the given 
efficient interest rate (yield) i ? 

Such an annual rate is usually denoted by i™ and is called a nominal annual interest 
rate (more precisely, an annual rate of interest payable (or convertible) m-thly). As follows 
from (8.1.2) and the discussion at the beginning of Section 8.1, the bank will divide the rate 
i”) by m, and at the end of each period, the current capital will be multiplied by 1 +i”) /m. 
In order for such a compounding to lead to the annual interest i, we should have 


This implies that 
if”) = m|(1 +)” — 1]. (8.2.1) 


Continuous compounding corresponds again to the case m — œ. The limit 


lim i”) = lim (m[(1 +i)!" —1]) =In(1 +3). (8.2.2) 
m—> o0 m— oo 
(For example, one can set x = 1/m and apply the L’ Hôpital rule.) 
So, continuous compounding corresponds to the annual rate 6 = In(1 +i), which is con- 
sistent with (8.1.5). 
Note also that i”) is decreasing from i!) =i to i© = 8 = ln(1 +i) as m is growing 
from 1 to infinity. (To prove that i”) is decreasing, it suffices to compute the derivative of 
i0”) in m.) 


8.3 Discount and annuities 


We define a discount factor or simply discount v; as the value of one unit of money to be 
paid at time ¢ if evaluation is carried out from the standpoint of the present time t = 0. In 
another terminology, v; is the present value of one unit of money to be paid at time t. 

The question of how to calculate a discount is rather complicated, if it is even solvable, 
since it involves many issues such as inflation, different choices of investment, randomness 
of investment results, etc. 

Roughly, v; may be defined as the amount of money which one should invest into a 
risk-free security at time t = 0, in order to have a unit of money at time t. 

Consider a unit time interval (say, a year), and assume that the risk-free efficient interest 
rate for this period is well defined and equals i. Then, investing Wa units of money, the 
investor will obtain TH -(1+i) = 1 at the end of the period. Thus, to get one unit at the 
end, one should invest 1 at the beginning. Hence, 


1 
Vj = V = —. 
: 1+i 


8. The Theory of Interest 65 


Let time be discrete, that is, £ = 0,1,... , and let the risk-free interest in each period 
[t — 1,t] be the same and equal i. Then the present value of one unit to be paid at the end of 
the period [t — 1,r], that is, at time f, is 


1 t 
a(S E 8.3.1 
vi (t=) v, ( ) 
where ¢ is an integer. 


Assume that an investor expects a future cash flow during n periods of time with pay- 
ments at the beginning of each period. Such a sequence of payments is called an annuity- 
due. Denote by c; the cash at time t; that is, at the beginning of the period [t,t +1). The 
present value of this payment is v'c;. The total cash flow is the sequence c9,C1,C2,---;Cn—15 
and the present value of this flow is the number 


= -1 
Ch = co + vci +c +... tv” leni = er vc. 

In the theory of interest, for the particular case of c; = 1, the quantity C, is denoted by äm, 
and we adopt this notation in this book. (The dots in ä indicate that payments are provided 
in the beginning of each period. See also Sections 9.1, 9.2.1 where we discuss annuities in 
much more detail.) Thus, 


l1—v” 

l-v’ 
provided v Æ 1. This quantity is called the present value of an annuity-due with unit rate, 
or simply an annuity-due. (More precisely, a certain annuity-due, because for now we deal 


with a non-random cash flow.) 
For v < 1 and an infinite horizon n = ©, the limit in (8.3.2) is equal to 


äm=l1+v +... +v"! = (8.3.2) 


= (8.3.3) 


This quantity is also called the present values of a perpetuity-due or, making it shorter, a 
perpetuity-due. 

If payments are made at moments t = 1,...,n, that is, at the ends of the periods, the 
present value of the cash flow equals 


1l-v" 


l-v- 


vy +... +v =v 


This quantity is denoted by am. It is called the present value of an annuity-immediate (or 
payable in arrears), or simply an annuity-immediate. The perpetuity-immediate is given 
by 


provided v < 1. 
Now, let us consider the continuous time case. Assume that an investment grows in 
accordance with (8.1.7). Then, to find v,, we should set in (8.1.7) A; = 1 and Ap = 1, 


which implies that 
t 
v = apf- | 5s)as} ; (8.3.4) 
0 


66 0. PRELIMINARY FACTS 


If the rate 5(s) equals a constant 6, then (8.3.4) implies that 


v= er. 
and we come to 
v=v', (8.3.5) 
where v = e~®. 
The relation (8.3.5) is adopted in many models not necessarily based on continuous com- 
pounding of interest—that is, v is not necessarily presented as e~. The term discount is 
usually applied to the characteristic v. 


8.4 Accumulated value 


Consider a series of payments of a unit of money at discrete time moments t = 0,1,..., 
n— 1. The unit paid at moment r, “starts to grow” and at time n becomes equal to (1 +i)"“, 
where i is the (annual) interest, or more precisely, the effective interest rate. Hence, the 
total accumulated value at time n is equal to the quantity 


n—l1 1+i ny 
m= $} (+) = Aap es ) : (8.4.1) 
t=0 
Since v = i: we can write 
1l/v)"-1 1—v" 
EY te DE (8.4.2) 
l-v l-v 


Note that we could come to the same formula just dividing the annuity-due dq from (8.3.2) 
by v”. Indeed, dq is the value of the cash flow from the standpoint of the time t = 0, while 
Sq is the value of the same cash flow from the standpoint of the time t = n. The former 
value is equal to the latter value times the discount factor v”. 


8.5 Effective and nominal discount rates 


Let i be an effective annual interest rate. Then the discount factor v = 1 [see (8.3.1)]. 
A person investing a unit of money at the beginning of a year will be credited į units at 
the end of the year. The present value of the interest payment i from the standpoint of the 
initial time is vi = i i. The quantity 


i 


=p 8.5.1 
1+i ( ) 


is called an effective rate of discount or a rate of interest-in-advance. The latter term is 
connected to the following interpretation. Assume that the interest is paid in advance at the 
beginning of the year. For this to be equivalent to the payment of i at the end of the year, 
the payment in advance should be equal to the present value of the amount i paid at the end 
of the year. The present value mentioned is equal to z3 -i= eae 
It is easy to see also that 
d=1-y. 


9. Exercises 67 


The last relation may be interpreted as follows. The quantity v is what you should invest at 
the beginning of the year to get one unit at the end of the year. So, d is an investment profit, 
more precisely, a profit rate since we are talking about a unit of money. 

Let d™” be the equivalent annual nominal rate of interest-in-advance credited m times in 
a year. Another term is a nominal rate of discount convertible mthly. In other words, d”) 
is an annual discount rate which leads to the effective annual interest rate i, and hence to 
the effective annual rate of discount d, if interest is compounded m times a year. 

Since d) is an annual rate, in accordance with rule (8.1.2) which we apply now to 
payments in advance, the rate for each period of length i should be d(”) = d”) /m. As 
we know (see Section 8.2), in order to get the interest i at the end of the year, the effective 
interest rate in each such period should be equal to i”) = i”) /m, where i”) is the nominal 
annual interest rate. Then, in accordance with (8.5.1), 


= ——, and d™ = md) = 
1 +70) 14i /m 


Substituting i”) = m[(1 +i)!/" — 1] from (8.2.1), after simple algebra we get that 


d™ = m(1 — (1 +i) t) = m(1 —v!/). (8.5.2) 

Furthermore, 
d — &as m —> o, (8.5.3) 
where ô= In(1 +i). This is certainly not surprising since the case m = œ corresponds to 
continuous compounding with rate 6. (To prove (8.5.3) one can set x = 1/m, v = e`% and 


apply L’ Hôpital’s rule.) 


9 EXERCISES 


Exercises in this chapter concern only Section 7. 


1. We say that Y does not depend on an event A if Y and 1, are independent. Show that in this 


case E{Y |A} = E{Y}. (Advice: Start with E{Y |A} = Pa) ee ;A} = a A 


2. The simple assertion of Exercise 1 may be generalized in the following way. For two events 
A, and Ao, if the vector (Y, 14, ) does not depended on 14,, then E{Y |AjA2} = E{Y |A1}. 


Show this. (Advice: Start with E{Y | A142} = Aaa PUA (Ag) Ee Mmh 


3. Graph the marginal density (7.1.10) from Example 7.1.2-1. Do you see without calculations 
that the integral of the function in (7.1.10) is indeed equal to one? Find E{Z}|Z}. Write 
E{Z3|Zi}. 


4. In the situation of Example 7.1.2-2, find the conditional density fx, |s(x|s) when 


(a) Xı and X are standard normal; 


68 


10. 


11. 


12. 


0. PRELIMINARY FACTS 


(b) Xı and X, have the I-distributions with parameters (a,v,) and (a,v2), respectively. 
In this case, the distribution we will get is called the B(beta)-distribution. Write a 
particular formula for the conditional density for vı = V, v2 = 1, and E{X, | S} in this 
case. Does it grow when v is increasing? To what does it converge as v — œ? Interpret 
this with regard to the form of the T-density fay(x). 


. Graph the marginal density (7.4.1) from Example 7.4-1. Do you see without calculations 


that the total integral of this function (with respect to both arguments) is indeed equal to one? 
Find E{Z}|Z1,Z2}. Write E{Z3 | Z1, Z2}. 


. Let the joint density of (X,Y) be f(x,y) = x+y for 0 < x < 1,0 < y < 1, and = 0 otherwise. 


Find the conditional density f(y |x), and E{Y |X}. At which x does the function m(x) attain 
its maximum? Find E{X |Y}. 


. Let the joint density of (X,Y) be f(x,y) = 2ye™* /x? for 0 < y < x, and = 0 otherwise. Find 


the conditional density f(y|x) and E{Y |X}. 


. Derive the formula (7.3.11) directly from the definition of f(z|x). (Advice: Rewrite the 


definition of f(z|x) as f(x,z) = f(z|x) fx (x) and integrate in x.) 


. A clerk in the claim department of an insurance company is waiting for the next customer 


call. It is known that the waiting time for the next man is exponentially distributed with 
mean mı, while for a woman this r.v. is exponential with mean mọ. Men and women call 
independently. Find the probability that the first call will be from a man. 


Let X; and X be iid. r.v.’s. Assume that we know values of Xmin = min{X1,X2}, and 
Xmax = max{X,,X2}. What can we say about the expected value of, for instance, X; given 
this information? (Advice: Note that by symmetry E {Xq | Xmin, Xmax } = E {X2 | Xmin, Xmax }-) 


Let r.v.’s X, Y, and Z be mutually independent. Suppose X and Y are standard normal, while 
X+YZ 
VI+Z2° 


Suppose that the random number generator in your computer is perfect. 


the distribution of Z is arbitrary. Find the distribution of the r.v. 


(a) You simulate one value of a r.v. X uniformly distributed on [0,1]. After that you 
simulate n independent trials with the probability of success equal to the value of X 
you got. Denote by Y the number of successes. Write E{Y} and Var{Y}. Find the 
distribution of Y.(Hint: While the first two questions are not very complicated, the 
third requires some calculations. In particular, we should know that f x*(1—x)"dx = 
k!m!/(k+m+1)!.) 


(b) You simulate one value of a r.v. N having the Poisson distribution with parameter À. 
After that, you simulate independent trials with a fixed probability p of success. The 
number of trials equals the value of N you got. Let Y be the number of successes. 
Find E{Y}, Var{Y} and the distribution of Y. In this book, we will consider this 
classical problem several times using different methods. Now our goal is to practice in 
conditioning. 


Chapter 1 


Comparison of Random Variables. 
Preferences of Individuals 


This chapter concerns various rules of comparison and subsequent selection among risky 
alternatives. 


1 A GENERAL FRAMEWORK AND FIRST CRITERIA 
1.1 Preference order 


What do we usually do when we choose an investment strategy in the presence of un- 
certainty? Consciously or not, we compare random variables (r.v.’s) of the future income, 
corresponding to different possible strategies, and we try to figure out which of these r.v.’s 
is the “best”. 

Suppose you are one of 2 million of people who buy a lottery ticket to win a single one 
million dollar prize. Your gain is a random variable (r.v.) 


1,000,000 with probability 1/2,000,000, 


c= 
0 with probability 1 — 1/2,000,000. 


If the ticket’s price is $1, then your random profit is § — 1. If you have decided to buy the 
ticket, it means that, when comparing the r.v.’s X = & — 1 and Y = 0 (the profit if you do not 
buy a ticket), you have decided, perhaps at an intuitive level, that X is better for you than 
Y. 

The fact that the mean value E{X} = E{Ẹ} — 1 = 4 — 1 = —4 is negative does not say 
that the decision is unreasonable. You pay for hope or for fun. 

Suppose you buy auto insurance against a possible future loss €. Assume that with prob- 
ability 0.9 the rv. € = 0 (nothing happened), and with probability 0.1, the loss € takes 
on values between $0 and $2000, and all these values are equally likely. In this case, 
E{&} = 0.1 - 1000 = 100. If the premium c you pay is equal, say, to $110, it means that the 
loss of € is worse for you than the loss of the certain amount c = 110. The fact that you pay 
$10 more than the mean loss, again, does not necessarily mean that you made a mistake. 
The additional $10 may be viewed as a payment for stability. 

For the insurance company the decision in this case is, in a sense, the opposite. The 
company gets your premium c, and it will pay you a random payment & The company 


69 


70 1. COMPARISON OF RANDOM VARIABLES 


compares the r.v. X=c —&€ with the rv. Y = 0, and if the company signs the insurance 
contract, it means that it has decided that the random income X is better than zero income. 

In the reasoning above, we assumed that decision did not depend on the total wealth of 
the client or the company but just on the r.v.’s under comparison. Suppose that in the case 
of insurance you also take into account your total wealth or a part of it, which we denote 
by w. Then the r.v.’s under comparison are w — € (your wealth if you do not insure the loss) 
and w — c (your wealth if you do insure the loss for the premium c). 

In the examples we considered, one of the variables under comparison was non-random. 
Certainly, this is not always the case. For example, if you decide to insure only half of the 
future loss € for a lower premium c’, then the r.v.’s we should consider are X = w — € (you 


do not buy an insurance) and Y = w — 5 —c’ (you insure half of the loss for c’). 


This chapter addresses various criteria for the comparison of risky alternatives. As a 
rule, we will talk about possible values of future income. In this case, while the criteria 
may vary, they usually have one feature in common. When choosing a possible investment 
strategy, we have competing interests: we want the income to be large, but we also want the 
risk to be low. As a rule, we can reach a certain level of stability only by sacrificing a part 
of the income: we should pay for stability. So, our decision becomes a trade-off between 
the possible growth and stability. 

Let us consider the general framework where we deal with a fixed class X = {X} of 
rv.’s X. We assume that r.v.’s from X are all defined on some sample space Q = {@} (see 
Section 0.1.3.1). That is, X = X (@). 

Defining a rule of comparison on the class X means that for pairs (X,Y) of r.v.’s from 
X, we should determine whether X is better than Y, or X is worse than Y, or these two 
random variables are equivalent for us. 

Formally, this means that among possible pairs (X,Y) (the order of the r.v.’s in the pair 
(X,Y) is essential), we specify a collection of those pairs (X,Y) for which X is preferable 
or equivalent to Y. In other words, “X is not worse than Y”, and as a rule we will use the 
latter terminology. We will write it as X = Y. 

If (X,Y) does not belong to the collection mentioned, we say “X is worse than Y” or 
“Y is better than X”, writing X < Y or Y > X, respectively. If simultaneously X = Y and 
Y =X, we say that “X is equivalent to Y”, writing X ~ Y. 

Not stating it each time explicitly, we will always assume that the relation = satisfies the 
following two properties. 

(i) Completeness: For any X and Y from X, either X = Y, or Y = X. (As was mentioned, 
these relations may hold simultaneously); 

(ii) Transitivity: For any X, Y, and Z from X, if X = Y and Y = Z, then X = Z. 

The rule of comparison so defined is called a preference order on the class X. 

Before discussing examples, we state one general requirement on preference orders. This 
requirement is quite natural when we view X’s as the r.v.’s of future income. 


The monotonicity property: 


If X,YEX and P(X>Y)=1, thenX XY. (1.1.1) 


1. A General Framework and First Criteria 71 


This requirement reflects the rule “the larger, the better”. If a random income X = X (œ) 
is greater than or equal to a random income Y = Y (œ) for all @’s or, at least, with probability 
one, then for us X is not worse than Y. 

It makes sense to emphasize that in (1.1.1) we consider not all r.v.’s but only those from 
the class X under consideration. We will see later that this is an important circumstance. 

It is natural to consider also 


The strict monotonicity property: 


If X,Y eX, P(X >Y)=1, and P(X >Y)>0,thenX >Y. (1.1.2) 


The significance of this requirement is also clear. If a random income X = X(@) is not 
smaller than Y = Y (œ) with probability one, and with a positive probability X (œ) is larger 
than Y (œ), then we prefer X to Y. 

In this book, we will accept only preference orders which are monotone in the class of 
the r.v’s under consideration. However, we will not always require strict monotonicity; see 
for example, the VaR criterion in Section 1.2.2. Nevertheless, if a rule of comparison is not 
strictly monotone, this says about some non-flexibility of this rule, and it makes sense, at 
least, to recheck to what extent it meets our goals. 


EXAMPLE 1. Let two r.v.’s, X = X (œ) and Y = Y(@), be defined on a sample space Q 
consisting of only two outcomes: œ; and @. The probabilities of the outcomes are equal to 
1/2. We may view X,Y as the random income corresponding to two investment strategies, 
and œ; , @2 as two states of the future market. Let 


Clearly, X (œ) > Y (œ) for both ’s and for any monotone order >, we will have X > Y, i.e., 
X is not worse than Y, which is natural. Suppose, however, that for an individual, her/his 
preference order > is monotone but is not strictly monotone, and though P(X >Y) = 5 > 0, 
the r.v. X is equivalent to Y. This means that the individual is indifferent whether to choose 
X or Y. The only way to interpret it, is to say that the individual needs at most two units 
of money, and does not need more. If, as a matter of fact, it is not true, the individual’s 


preferences should be described in a more flexible way. 


Let V(X) be a function taking on numerical values. We say that an order = is preserved, 
or completely characterized, by V(X) if for any X,Y € X, 


XZY V(X) >V(Y), (1.1.3) 


where the symbol means if and only if, also abbreviated iff. 
The function V(X) may be viewed as a measure of the “quality” of X: the larger V(X), 
the better X, and X is not worse than Y iff V(X) >V(Y). 


72 1. COMPARISON OF RANDOM VARIABLES 


Below, we consider various examples; but first it is worthwhile to note that in the case of 
(1.1.3), the monotonicity property may be restated as 


If X,YeX and P(X >Y)=1, then V(X) >V(Y). (1.1.4) 


The strict monotonicity is equivalent in this case to the property 


If X,Y €X, P(X >Y)=1, and P(X >Y)>0, then V(X) >V(Y). (1.1.5) 


Let us turn to examples. 


1.2 Several simple criteria 


We will talk about preferences of economic agents—separate individuals, companies, 
etc.—using also, for brevity, the term “investor”. 


1.2.1 The mean-value criterion 


The investor cares only about the mean values of r.v.’s, that is, 
XZY Ss E{X}>E{Y}. 


In this case, the collection of all pairs (X,Y) mentioned above is just the collection of all 
pairs (X,Y) for which E{X} > E{Y }, and in (1.1.3), V(X) = E{X}. 

Clearly, this criterion is strictly monotone; the reader is invited to show it on her/his own. 
Note, however, that from the mean-value criterion’s point of view, for example, r.v.’s 


a { 100 with probability 1/2 and Y=50 (1.2.1) 


0 with probability 1/2 ’ 


are equivalent. This might not reflect people’s real preferences; so, the criterion may occur 
to be too simple, non-flexible. However, as we will see, in some situations quite reasonable 
comparison rules may turn out to be close to the mean-value criterion. 


1.2.2 Value-at-Risk (VaR) 


Another term in use is the capital-at-risk criterion. For a r.v. X, denote by qy = qy(X) 
its y-quantile or the 100y-th percentile. The reader is recommended to look up the rigorous 
definition in Section 0.1.3.4, and especially Fig.0-7 there. Loosely speaking, it is the largest 
number x for which P(X < x) < y. If the rv. X is continuous and its distribution function 
(d.f.) F(x) is increasing, gy is just the number for which F (qy) = y. See also Fig.0.7a. The 
discrete distribution case is illustrated in Fig.0.7de. 

Let y be a fixed level of probability, viewed as sufficiently small. Assume that an investor 
does not take into consideration events whose probabilities are less than y. Then, for such 
an investor the worst, smallest conceivable level of the income is qy. 


1. A General Framework and First Criteria 73 


Let, for instance, y= 0.05. Then qo.o5 is the smallest value of the income among all 
values which may occur with 95% probability. One may say that qo.o5 is the value at 5% 
risk. Note that gy may be negative, which corresponds to losses. 


The VaR criterion is defined as 
X ZY qX) = ay(Y), 


i.e., we set V (X) = qy(X) in (1.1.3). 

In applications of VaR, for the y-quantile of X, the notation VaRy(X) is frequently used; 
we will keep the notation qy(X). 

The particular choice of y = 0.05 is very common, but it has rather a psychological 
explanation: 0.01 is “too small”, while 0.1 is “too large”. As a matter of fact, whether a 
particular value of y should be viewed as small or not depends on the situation. We can 
view a probability of 0.05 as small if it is the probability that there will be a rain tomorrow. 
However, the same number should be considered very large if it is the probability of being 
involved in a traffic accident: it would mean that on the average you are likely to be involved 
in an accident one out of twenty times you are in traffic. 

EXAMPLE 1. Let a r.v. X (say, a random income) take on values 0,10 with proba- 
bilities 0.1 and 0.9, respectively, and let a r.v. Y take on the same values with respective 
probabilities 0.07 and 0.93. The reader is suggested to check that for y = 0.05, we have 
Gy(X) = qy(¥) = 0 (look, for example at Fig.7d in Section 0.1.3.4). So, X and Y are equiv- 
alent under the VaR criterion. However, for y = 0.08 we have qy(X) = 0 while g,(Y) = 10, 
that is, X is worse than Y. So, the result of comparison depends on y. 


The VaR criterion is monotone. Indeed, if X > Y with probability one, than P(X < x) < 
P(Y <x), so in this case gy(X) > qy(Y ). However, VaR is not strictly monotone. 


EXAMPLE 2. Let Y be uniform on [0,2], and 


Y ifY <1, 
s a 


We see that P(X > Y) = P(1<Y <2) = 4 > 0. However, if x < 1, then P(X <x) = P(Y <x), 
and hence qy(X) = qy(Y) if y < 5. 


The fact that the VaR criterion is not strictly monotone does not provide sufficient grounds 
to reject the VaR in any case; we discuss it in more detail in Section 1.2.3. However, we 
should be aware that this is not a flexible criterion since it does not take into account all 
values of r.v.’s, as we saw in the example above. 


EXAMPLE 3. Let X be normal with mean m and variance o°. Since the d.f. of X 
is ® (=) (see, e.g., Section 0.3.2.4), the y-quantile of X is a solution to the equation 
(T) =y. Denote by qy the y-quantile of the standard normal distribution, i.e., ®(gys) = 
y. Then we can rewrite the equation mentioned as cm = qys, and 


(X) = m+ Gys0. 


74 1. COMPARISON OF RANDOM VARIABLES 


The coefficient qys depends only on y. Usually people choose y < 0.5, and in this case 
dys < 0. For example, if y = 0.05, then gy; ~ —1.64 (see Table 2 in Appendix, Section 2), 
and the VaR criterion is preserved by the function gy(X) ~ m— 1.6460. Criteria of the type 


V(X) =m—ko, (1.2.2) 


where k is a positive number, are frequently used in practice, and not only for normal r.v.’s; 
as we will see in Section 1.2.5, maybe too frequently. The expression in (1.2.2) can be 
interpreted as follows. If we view X as a future income and variance or standard deviation 
as a measure of riskiness, then we want the mean m to be as large as possible and © as small 
as possible. This is reflected by the minus sign in (1.2.2). The number k may be viewed as 
a weight we assign to variance. 


EXAMPLE 4. There are n = 10 assets with random returns X1, ..., Xn. The term “return” 
means the income per $1 investment. For example, if the today price of a stock is $11, 
while the yesterday price was $10, the return for this one-day period is H = 1.1. Note that 
a return X may be less than one, and in this case we face a loss. 

Assume X4, ...,Xn to be independent and their distributions to be closely approximated 
by the normal distribution with mean m and variance o°. 

Let us compare two strategies of investing n million dollars: either investing the whole 
sum in one asset, for example, in the first, or distributing the investment sum equally be- 
tween n assets. We proceed from the VaR criterion with y = 0.05. 

For the first strategy, the income will be the r.v. Yı = nXı = 10X1. The mean E{Y;} = nm, 
and Var{Y; } = n?0?, so to compute qy(Y;) we should replace in (1.2.2) m by nm, and © by 
no. Replacing qys by its approximate value —1.64, we have 


qy(V1) = mn — 1.64no = 10m — 16.40. 


For the second strategy, the income is the r.v. Yọ = X; +... +X,. Hence, E{Y2} = nm, 
Var{Y2} =no?, and 
dy(Y2) = mn — 1.64,/no ~ 10m — 5.20. 


Thus, the second strategy is preferable, which might be expected from the very beginning. 
Nevertheless, in the next example we will see that if the X;’s have a distribution different 
from normal, we may jump to a different conclusion. 


EXAMPLE 5!. There are ten independent assets such that investment into each with 
99% probability gives 4% profit, and with 1% probability the investor loses the whole 
investment. Assume that we invest $10 million and compare the same two strategies as in 
Example 4. Let us again apply the VaR criterion with y = 0.05. 

If we invest all $10 million into the first asset, we will get $10.4 million with probability 
0.99, and in the notation of the previous example gy(10X;) = 10.4. 

For the second strategy, the number of successful investments has the binomial distri- 
bution with parameters p = 0.99, n = 10. If the number of successes is k, the income is 


"This example is very close to an example from [1] presented also in [8, p.14] with the corresponding reference. 


1. A General Framework and First Criteria 75 


k x Imillionx 1.04. The d.f. F (x) of the income is given in the table below. The values of 
F (x) are the values of the binomial d.f. with the parameters mentioned. 


k <6 7 8 9 10 
The income x | < 6.24 7.28 8.32 9.36 10.4 
The d.f. F(x) | < 2.002. 1076 | 0.000114 | 0.004266 | 0.095618 | 1 


The 0.05-quantile of this distribution is 9.36 < 10.4. Therefore, following VaR, we 
should choose the first investment strategy. 

Note, however, that if we choose as y a number slightly smaller than 0.01, for example 
= 0.0095, then the result will be different. In this case, qy(10X1) = 0, while the 0.0095- 
quantile of the distribution presented in the table, is again 9.36. 

Certainly, the results of the comparison above should not be considered real recommen- 
dations. On the contrary, the last example indicates a limitation of the application of VaR, 
and shows that this criterion is quite sensitive for the choice of y. 


The reader can find more about the VaR criterion, for example, in [60], [66], [64]. Some 
references may be found also in http://www. riskmetrics.com and http://www. gloriamundi.org. 


1.2.3 An important remark: risk measures rather than criteria 


This simple but important remark concerns the two criteria above and practically all other 
criteria we will consider in this chapter. The point is that we do not have to limit ourselves 
to using only one criteria each time. On the contrary, we can combine them. 


For example, when considering a random income X, we may compute its expectation 
E{X} and its quantile g(X ). In this case, we will know what we can expect on the average, 
and what is the worst conceivable (or likely) outcome. When comparing two r.v.’s, we 
certainly may take into account both characteristics. How we will do this depends on our 
preferences. The simplest way is to consider the linear combination QE {X } + Bqy(X), 
where & and B play the role of weights we assign to the mean and to the quantile. The 
larger B, the more cautious we are. 


Under such an approach to risk assessment, various functions V (X) present rather possi- 
ble characteristics of the random income X than criteria. In this case, we call V(X) a risk 


measure. 
Route 1 = page 79 
1.2.4 Tail-Value-at-Risk (TailVaR) bs 


Another term for this criterion is Tail conditional expectation (TCE). This is a modifica- 
tion of VaR. The motivation is illustrated by the following example. 


76 1. COMPARISON OF RANDOM VARIABLES 


EXAMPLE 1. Consider two r.v.’s X and Y of the future income such that 


values —2 —1 10 | 20 
X takes on 
with probabilities | 0.01 | 0.02 | 0.47 | 0.5 
7.109 | — 
Pee values | 2-10 | 1 | 10 | 20 


with probabilities | 0.01 | 0.01 | 0.48 | 0.5 


The probabilities that the income will be negative in both cases are small: 3% and 2%. For 
y = 0.025, we would have g,(X) = —1 and qy(Y) = 10. So, under the VaR criterion, Y 
is preferable, which does not look natural. While we may neglect negative values of the 
income in the first case, this may be unreasonable in the second: a loss of 2 million can be 
too serious to ignore, even if such an event occurs with a small probability of 1%. 


In situations as above, we speak about the possibility of large deviations, or a heavy tail 
of the distribution (for the term “tail”, see also Section 0.2.5). We compare the “tails” of 
different distributions in more detail in Section 2.1.1. For now, we introduce a criterion 
which involves the mean values of large deviations. 

First, consider the function 


V(X; t) =E{X |X <t}, (1.2.3) 
the mean value of X given that the income X did not exceed a level t. 


(Formally, the right member of (1.2.3) is defined by 


t t 


E{X|X <1} = EN frare) DEF frare, (1.2.4) 


—oo —oo 


where F (x) is the d.f. of X. The formula covers the cases of discrete and continuous r.v.’s 
simultaneously if we understand the integral above as in (0.2.1.5) from Section 0.2.1. We 
consider conditional expectations in detail in Section 0.7; though now, it is sufficient for 
us just to use definition (1.2.4).) 


If we are interested only in losses, then it suffices to consider t < 0. Then V(X; t) is 


negative. 
Note that in such situations, people often consider not the income but the losses directly, 
that is, instead of the r.v. X, the r.v. X = —X. Negative values of X correspond to positive 


values of X and vice versa. In this case, E{X |X < t} = —E{X|X > |t|} ift < 0. The risk 
measure E{X |X > s} is the expected value of the loss given that it has exceeded a level s. 
In insurance, it is called an expected policyholder deficit. See also Exercise 9. 


Let us come back to V(X;r) and take, as the level r, the y-quantile qy(X). Accordingly, 
we set 
Vigil (X) = E{X |X < gy(X)}, 


and define the rule of comparison of r.v.’s by the relation 


X OY = Vail(X) > Vai). 


1. A General Framework and First Criteria Td 


EXAMPLE 2. Consider the r.v.’s from Example 1 for y= 0.025. As we already saw, in 


this case, qy(X) = —1 and q,(Y) = 10. If the r.v. X took a value less or equal to —1, then it 
can take on only values —1 and —2. Hence, 


Viail(X) = E{X |X < —1} = (-2)P(X = —2|X < -1)+(-1)P(X = —1 |X < -1) 
si 9). 0-01 ig 1) 9.02 _ 4 
0.03 ` 0.03 3° 


For Y, computing in the same manner, we have 


01 01 4 
Vail) = (=2: 106): %5 + ( eae no tS 39990.42, 


which is much less than —4/3. So, X Z Y. 


Route2 = page 78 


Now, assume, for simplicity, that the income consists of a fixed non-random positive part 
and a random loss &. Since the positive part is certain, we can exclude it from consideration, 
setting the income X = —€. Let us denote by G(x) the d.f. of &, and set G(x) = P(§ > x) = 
1 — G(x), the tail of the distribution of €. Suppose G(x) is continuous. For t < 0, we have 


P(X < t) = P(E > —t) =P(E> |t|) = G(lt|), (1.2.5) 


and 


E{XIX <1} = B{-8|-E <1} = -E {IE > D = pes few) 


1 i 1 E 
= ap fy D = Baw I, OO: 


Integration by parts implies that 


1 


1 a goes = 
< = —— aaan lman z= — ————— 

EXX <1} =a ( HeD- f, Gx)ax) (i tay f Gja) 

(1.2.6) 

EXAMPLE 3. (a) Let G(x) = e™*, that is, & is a standard exponential r.v. We may avoid 

long calculations if we recall that € has the memoryless property (see, e.g., Section 0.3.2.2). 

By this property, if € has exceeded a level x, the overshoot (over x) has the same distribution 
as the r.v. € itself. Since E{€} = 1, we can write 


E{E|§ >t} =r4+1. 


Hence, 
|E{X|X <q}|=E{§|§ > lal} =lal +1, 
and 
Y= P(X < q) = PE > lal) = G(lql) =e". 


78 1. COMPARISON OF RANDOM VARIABLES 


Hence, and |g| = In; provided y > 0. Thus, for y > 0, 
1 1 
|E{X |X < qy}| er 1, and Vai(X) = E{X |X < q} = = 1. 


(b) Now, let G(x) = 1/(1 +x}? for x > 0. This is a particular case of the Pareto distribu- 
tion that we consider in more detail in Section 2.1.1. The tail of this distribution is viewed 
as “heavy”. 

To find q = qy, we make use of (1.2.5) and write 


5 1 
y=P(X < q) = G(lq|) = ——, (1.2.7) 
Esa =Gllal) = Gop 
which implies 
1 
=—-l, (1.2.8) 
layl A 


again provided y > 0. From (1.2.6) it follows that 


oo 


TER 1 
|E{X|X < q)|= Asin], CUES +a +a f TFP” 


= |a| + (1+ l4)? 


1 
egy O ae 


Substituting (1.2.8), we have 


2 
JE{X|X <ay}|=— —1. 


vY 


The absolute value above corresponds to the losses. For the (negative) income X 


2 
VailX) = EXX <a- 
(c) Let us compare the two cases above. In the first, P(E > x) = e~*; in the second, 
P(&>x)=1/(1+x)?. The latter function converges to zero much more slowly than the for- 
mer. One may say that the tail in the latter case is much “heavier”. It means that the 
probability to have essential losses is larger in the latter case, and we should expect that 
under the Tail VaR criterion the exponential distribution is “better”. 
This is indeed the case for all y’s. To show it, we should prove that the difference 


2 1 2 1 
( 1) (In—+1)= In-—2 
v Y VY Y 
is positive for all positive y's. Denote this difference by C(y). Note that C(1) = 0, and 


1 1 1 
C'(y) = oP e PP (./y—1) <0 forall y€ [0,1). Hence, C(y) > 0 for y< 1. 


Next, we test the TailVaR on monotonicity. It turns out that ingeneral, the TailVaR 
criterion is not monotone. 


1. A General Framework and First Criteria 79 


EXAMPLE 4. Though we are interested in losses, to make the example illustrative, we 
will consider non-negative r.v.’s X = X (œ) and Y = Y (œ). (Subtracting from both r.v.’s a 
large number c, we may come to r.v.’s with negative values, but the result of comparison of 
X —c and Y —c will be the same as for X and Y.) 

Let the space of elementary outcomes Q = {@ 1, @2, 3}, and the probabilities and values 
of X and Y be as follows: 

0 2 Ws 


P(®)= 0.1 04 0.5 
o)= 0 10 20 
Y(o)= 0 10 10 

Clearly, P(X > Y) = 1 and P(X >Y) =0.5 > 0, so it is quite reasonable to prefer X to Y. 
Set, however, y = 0.2. Then, as we can see from the table (or by graphing the d-f.’s of X 
and Y), the quantiles qy(X) = qy(Y) = 10. Now, Vaail(X) = E{X |X < 10} =0-44+10-3= 
8, while Via (Y) = E{Y |Y < 10} = E{Y} =0-0.1+10-0.9 = 9. Thus, with respect to the 
TailVaR, Y is better than X, which contradicts common sense. 


Nevertheless, the TailVaR criterion arose as a result of reasonable argumentation. There- 
fore, it makes sense not to reject it but realize each time in what situations the monotonicity 
property is fulfilled. Note also the following. 

First of all, the TailVaR is monotone in the class of continuous r.v.’s. An advice on how 
to show this is given in Exercise 7. 

It may be also shown that if Q is finite and all w’s are equiprobable, then under some 
mild conditions on r.v.’s or for a slightly modified criterion, monotonicity does take place. 
We consider it in more detail in Section 1.3. The discussion there points out how to redefine 
the Tail Var to make it monotone. 


1.2.5 The mean-variance criterion 


This criterion is, in a sense, the same as (1.2.2), but the motivation and derivation are 
different. Consider an investor expecting a random income X. Set my = E{X}, 0% = 
Var{X }. Suppose the investor measures the riskiness of X by its variance, and wishes the 
mean income my to be as large as possible and the variance 0%—as small as possible. The 
quality of the r.v.X for such an investor is determined by a function of my and o%. Ina 
simplest case, it is a linear function of my and ox, and we can write it as 


V(X) =tny — ox, (1.2.9) 


where the minus reflects the fact that the quality decreases as the variance increases. 

The positive parameter T plays the role of a weight assigned to my: a larger T indicates 
that the investor values mean more highly. This parameter is usually called a tolerance to 
risk. 

We assigned—unlike in (1.2.2)—a weight to the mean rather than to the standard de- 
viation, merely following a tradition in Finance; it does not matter which parameter is 


endowed by a coefficient. For example, we can write V(X) = t(my — Ox): Then a weight 


80 1. COMPARISON OF RANDOM VARIABLES 


is assigned to the standard deviation, while the factor Tt in the very front does not change 
the comparison rule. 

Note also that often, instead of (1.2.9), people consider V (X) = tmx — ore but the differ- 
ence is also non-essential: both criteria proceed from the same measure of riskiness. The 
choice of one of them is rather the matter of convenience. 

The function V(X) in (1.2.9) preserves the corresponding preference order = among 
r.v? s: 

X Z Y & tmy — Oy > tmy — Oy. 

It is noteworthy that when presenting (1.2.9), we did not assume r.v.’s under consideration 
to be normal, which we did when deriving criterion (1.2.2). We will see that this may 
cause problems. The mean-variance criteria (in slightly different forms) are very popular, 


especially in Finance, and at first glance look quite natural. However, there are situations 
where the choice of such criteria may contradict common sense. 


EXAMPLE 1. Let X = 0, a number a > 1, and 


_ f a with probability +, 
-| O with probability 1 — +. 


Clearly, E{Y} = 1 and Var{Y } = E{Y?°} — (E{Y})* =a — 1. Then, 


V(Y)=t-1—-Va—1=t—Va-—I, while V(X) =1t-0-0=0. (1.2.10) 


So, whatever tT is, we can choose a sufficiently large a for which V(Y) < 0. On the other 
hand, V(X) = 0, and under the mean-variance criterion, Y is worse than X, whereas P(X < 
Y) =1. Clearly, if we replace in (1.2.10) the standard deviation \/a—1 by the variance 
a— 1, then the difference will be even more dramatic. 


Note also that it would be a mistake to think that the example above is contrived, and in 
practice problems, we do not watch such cases. To show this, consider 


EXAMPLE 2. Let X take on values from [1,°°), and P(X > x) = 1/x® for all x > 1 and 
some O > 2. This is a version of the Pareto distribution we discuss in Section 2.1.1. The 
Pareto distribution, in different versions, is used in many applications including actuarial 
modeling. It is not difficult to compute that 


The reader can check it right away or wait until Section 2.1.1. 
Let Y be uniformly distributed on [0,1]. Obviously, X > Y with probability one. 
In accordance with (1.2.9), 


1 1 
pee va , and V(Y) =t- =- —— 
a-1 a—2(a-1) 2 12 


(for the standard deviation of the uniform distribution, see Section 0.3.2.1). 


(1.2.11) 


1. A General Framework and First Criteria 81 


We see from (1.2.11) that, whatever T is, if & approaches 2, the function V (X) converges 
to —co, Consequently, for any t, we can choose & (and hence a r.v. X) such that V(X) < 
V(Y). 

Thus, under the mean-variance criterion, X is worse than Y, and consequently the crite- 
rion (1.2.9) is not monotone. 


It is also worth noting that the linearity of the function in the r.-h.s. of (1.2.9) is not an 
essential circumstance, neither is the choice of particular r.v.’s X and Y. 


One may observe the same phenomenon for V (X) equal to almost 
any function g(my,Ox) of the mean and standard deviation. 
Moreover, for any r.v. X, we may point out a r.v. Y such that P(Y > X) = 1 
whereas V(Y) < V(X). 


More precisely, it may look as follows. Let V(X) = g(mx,ox). To avoid cumbersome 
formulations, assume that g(x,y) is smooth. Since we want the mean to be large and the 


d 
variance to be small, it is natural to assume that the partial derivatives g(x,y) = ar g(x,y) > 
x 
d 
0, 82(x,y) = Bye) <0. 


Proposition 1 Assume, in addition, that the partial derivatives g\(x,y) and g2(x,y) are 
continuous functions. Then for any r.v. X with a finite variance, there exists a r.v. Y such 
that P(Y > X) = 1, while 

g(mx, Ox) > g(my, Oy). 


We will prove it in the end of this section. 


Proposition 1 is a strong argument against using variance as a measure of risk. However, 
if we restrict ourselves to a sufficiently narrow class of r.v: s, the monotonicity property may 
hold. 

In particular, this is true if we consider only normal r.v.’s because there are no two normal 
r.v.s, X and Y with different variances and such that X < Y with probability one. 

To show it rigorously, assume that the normal r.v.’s X and Y mentioned exist. We have 
P(X <x) = ®((x—my)/ox) and P(Y < x) = ®((x—my)/oy). Since P(Y > X) = 1, it is 
true that P(Y <x) < P(X < x), and hence ® ((x—my)/oy) < ®((x—my)/ox) for any x. 


The function ®(x) is strictly increasing. Therefore, from the last inequality, it follows 
x-—m x-—m . . 
that £ < * for all x. Certainly, this cannot be true if oy 4 Ox because two lines 


with different slopes intersect and at only one point. 

On the other hand, if oy = oy, the comparison is trivial: Y = X if my > my. 

The case of normal r.v.’s is simple because the normal distribution is characterized only 
by two parameters: mean m and standard deviation ©. Each normal distribution may be 


identified with a point (m,o) in a plane, and the rule of comparison will be equivalent to a 
rule of comparison of points in this plane. 


82 1. COMPARISON OF RANDOM VARIABLES 


If we consider a family of distributions with three or more parameters but still compare 
these distributions proceeding from their means and variances, we may come to paradoxes 
similar to what we saw above. 

We touch on one more example of the violation of monotonicity. Consider the family of 
r.v.s c+Xqy, where c is a parameter and Xay has the I-distribution with parameters a,v 
(see Section 0.3.2.3 and Fig.0.13 there). The distribution of c+ Xav is called a translated 
T-distribution or a I’-distribution with a shift; it is widely used in many areas including 
insurance as we will see in this book repeatedly. The distribution is asymmetric, and it may 
be only very roughly characterized by its mean and variance. So, in this case. it is possible 
to build an example of violation of the monotonicity property. We skip details; the first 
such example was suggested by K. Borch [15]. 

In conclusion, it is worth again emphasizing that the reasoning above does not mean that 
we Should not use mean-variance criteria, but it does mean that we should be cautious. 


> Proof of Proposition 1 uses the Taylor expansion for functions of two variables. Let 
E{X} =m, Var{X} = 0°, and a number £ € (0,1). Set Y = X + ĉe, where the r.v. Ée is 
independent of X, and 
__ f e7! with probability £, 
Se= { O with probability 1 —e°. 


Obviously, P(Y > X) = 1. Furthermore, E{E_} =e7, Var{E,} =£ — £4, and hence, E{Y} = 
m-+e?, and Var{Y} =o? +e- e. 

Then oy = Vo? +e-—et. Applying Taylor’s expansion for this function of € (see the 
Appendix, (4.2.3)), and assuming © Æ 0, we get that oy = © + + €+o(€), where here and 
below o(€) stands for a remainder negligible as € — 0. (See also the Appendix, Section 
4.1.) 


1 
By the Taylor expansion for g(x,y), we have g(my ,oy) = g (r +e,0o+ 56° + o(e)) = 


1 1 
g(m,o) + g1(m,o)€” + go(m,6) (5 +ole)) +o(€) = g(m,o) + 55 82(m9) €+o(€). 
Because g2(m,67) < 0 and the remainder o(£) is negligible for small £, there exists £ > 0 
such that + g2(m,0°)e +0(£) < 0. 
For such an g, we have g(my,o;) < g(m,o°) = g(my, oy). 
In the case o = 0, we have oy = v£ — £4 = y£ + o(£), and the proof is similar. W < 


Routes 1 and2 = page 86 


1.3 On coherent measures of risk 


In this section, we discuss some desirable properties of risk measures. It is important 
to emphasize, however, that we should not expect these properties to hold in all situations, 
especially simultaneously. The properties themselves have long been known, but they at- 
tracted a great deal of attention due to the paper [8] which had given deeper insight into the 


1. A General Framework and First Criteria 83 


nature of some useful criteria. See also a further discussion in [9], the monograph [33], and 
“an exposition for the lay actuary” with some examples in [91]. 
We describe properties below in terms of V(X) preserving =. 


I. Subadditivity. For all X,Y € X, 
V(X+Y)>V(X)+V(Y). (1.3.1) 


This requirement concerns the diversification of portfolios. Let us view X and Y as the 
random results of the investments into two assets, and V(X) and V(Y) as the values of the 
corresponding investments. Then the left member of (1.3.1) is the value of the portfolio 
consisting of the two investments mentioned, while the right member is the sum of the 
values of X and Y, considered separately. 

Note also that if (1.3.1) is true for two r.v.’s, it is true for any number of r.v.’s. 

Thus, under a preference with this property, it is reasonable to have many risks in one 
portfolio (when risks may, in a sense, compensate each other) rather than to deal with these 
risks separately. 

II. Positive Homogeneity. For any À > 0 and X € X, 


V(AX) =V (X). 
III. Translation Invariance. For any number c and X € X, 
V(X +c)=V(X)+c. 


Properties II-III establish invariance with respect to the change of scale. For example, if 
we decide to measure income not in dollars but in cents, under the requirement II, the value 
of investment (if this value is measured in money units) should be multiplied by 100. If we 
add to a random income a certain amount c, in accordance with III, the value of the income 
should increase by c. 

Note at once that the value of investment may be measured not only in money units. This 
is the case, for example, when we apply the utility theory which we discuss in detail in 
Section 3. Properties I-III are not so innocent as they might seem, and many criteria we 
consider later, do not satisfy them. However, if (II-IID hold, it certainly “makes life better”. 

Since in this setup, in general, V(X) is not connected with some particular probability 
measure, we call V(X) monotone if V(X) > V(Y) when X(@) > Y (œ) for all œ. 

Criteria satisfying I-III together with the monotonicity property are called coherent. 

Because the mean-variance criterion is monotone only in special situations, consider, as 
examples, the first three criteria from Section 1.2. 

The mean-value function V(X) = E{X} satisfies all three criteria above, as is easy to 
see. (For example, E{X +Y} = E{X}+E{Y}, and similarly one can check the other 
properties.) This is true not because this criterion is very good, but because it is very 
simple. 

The VaR and TailVaR satisfy I-III. Assume, for simplicity, that X is a continuous r.v. 
Then P(X < qy(X)) =y. To show that gy(AX) = Agy(X), we should prove that P(AX < 
Agy(X)) = y. But it is obvious since À cancels out. 


84 1. COMPARISON OF RANDOM VARIABLES 


To prove that, for example, Vaii (AX) = Wian (X), it suffices to write Vian (AX ) = E{AX | AX < 
Gy(AX) } = MELX |AX < Agy(X)} = ME{X |X < G(X) } = Wai (AX). 


Property III is considered similarly. 


It remains to check the main (and most sophisticated) property I. The VaR does not 
satisfy this property in general as is shown in 


EXAMPLE 1. Let us revisit Example 1.2.2-5. Note that if Property I holds for two r.v.’s, 
then it holds for any number of r.v.’s. In Example 1.2.2-5, we computed that qy(X1 + ... + 
X10) = 9.36 if y= 0.05. Since P(X; = 0) = 0.01, for the same y, the quantile qy(X1) = 1.04. 
Then gy(X1) +... + qy(Xn) = 10- 1.04 = 10.4 > 9.36, and hence Property I does not hold. 


In general, the TailVaR criterion does not satisfy Property I either, and, as we know, it 
is not even monotone. Nevertheless, in Example 2c below, we consider some conditions 
under which both properties hold. As was mentioned in Section 1.2.4, in particular, it 
concerns the case where the space Q is finite and all @’s are equally likely. 

Since in the scheme of equiprobable 0’s, the r.v.’s themselves may assume various values, 
the requirement that all œs are equally likely is not very strong, and the Tail VaR criterion 
may prove to be efficient in many situations. Nevertheless, it is worthwhile to make the 
following two remarks. 

The goal of the TailVaR criterion is to exclude, as far as it is possible, strategies which 
could lead to large losses, but it does not take into account possibilities of other values of 
income, large or moderate. One may say it is a pessimistic criterion. 

Secondly, the TailVaR criterion and other criteria we considered are normative, that is, 
invented by people. These criteria are applied consciously by companies for explicitly 
stated goals and in explicitly designated situations. When we deal with separate people, the 
picture may be different. Real individuals are not always pessimistic, often make decisions 
at an intuitive level, and sometimes are quite sophisticated. To describe their behavior, we 
should proceed from qualitatively different principles. An introduction to the correspond- 
ing theory is given in Sections 3-4. 


Next, we give an implicit representation of the whole class of criteria satisfying Proper- 
ties I-III together with monotonicity. Consider r.v.’s X = X (œ) defined on a sample space 
Q. Denote by Ep{X} the expected value of X with respect to a probability measure P 
defined on sets from Q. 

It was shown in [8] that functions V (X) satisfying all properties mentioned are functions 
which may be represented as 


V(X) = min Ep{X}, (1.3.2) 


where P = {P} is a family of probability measures P on Q. In other words, each function 
V(-) corresponds to a family P, and vice versa. 

For the reader familiar with the notion of infimum, note that in general the minimum 
above may be not attainable, and more rigorously, a necessary and sufficient condition is 
the existence of a family P such that V(X) = inf Ep{X}. 


1. A General Framework and First Criteria 85 


EXAMPLE 2. (a) Let Po be the probability measure representing the “actual” probabili- 
ties of the occurrence of events œ, and let P consist of only one measure Po. Then (1.3.2) 
implies that V(X) = Ep,{X}, and we deal with the mean-value criterion. 


(b) Let Q = {@,...,@ny} be finite, and let P consist of all probability measures on events 
from Q. Denote by P(@) the probability of œ corresponding to measure P. The expected 
value with respect to P is 

n 
Ep{X} = } X(@;)P(@)). 
i=l 
To minimize the last expression, we should choose P which assigns the probability one to 
the minimum value of X (œ). For such a measure, Ep{X } = ming X(@). So, 


V(xX)= minX (@). 


(c) Let again Q = {@),...,@,}. Assume that the “actual” probability measure Pp as- 
signs the equal probabilities 1 to each œ. We show that the TailVaR criterion admits the 
representation (1.3.2). 

To make our reasoning simpler, consider only r.v.’s X (œ) taking different values for dif- 
ferent ’s. 

We fix a y and denote by k = k(y) the integer such that Aa} <y< k; that is, k = [ny] + 1, 
where |a] denotes the integer part of a. Consider all sets A from Q containing exactly k 
points. Let P4(@) be the measure assigning the probability i to each point from A, and 
zero probability to all other n — k points. Let P consist of all such measures P4. 

Consider now a r.v. X(@) and set x; = X(@;). We have assumed x;’s to be different. 
Without loss of generality, we can suppose that xı < x2 < ... < Xn, since otherwise we can 
renumerate the @’s. The reader is invited to verify that with respect to the original measure 
Po, first, qy(X ) = xx, where k = k(y) chosen above, and second, that 


1 
TailVar(X) = E{X |X < q} = ze! +... +x). (1.3.3) 
On the other hand, for any A consisting of k points, say, points ©; ,.--, @i,» 
1 1 1 1 
Ep, {X} = X(0i,)7 Psst X(Wi) 7 = g i +. Hx) > pei to +x) = E{X |X < qy}, 


because x ,...,x, are the k least values of X. Thus, the minimum in (1.3.2) is attained at 
Pa, where Ao = {01, ...,@k }, and this minimum is equal to TailVar(X). 

Note that if x;’s are not different, formally we cannot reason as above since in this case 
(1.3.3) may be not true, and we may construct an example close to Example 1.2.4-4. How- 
ever, in this case, we may modify the TailVar criterion itself defining it as in (1.3.3). In the 
case of different x;’s, it will coincide with the “usual” Tail Var. 


In conclusion, note that we should not, certainly, restrict ourselves only to coherent mea- 
sures. Often, it is reasonable to sacrifice some properties mentioned above in order to deal 
with more flexible characteristics of distributions. It concerns, in particular, criteria we 
consider in following sections of this chapter. 


86 1. COMPARISON OF RANDOM VARIABLES 


2 COMPARISON OF R.V.’S AND LIMIT THEOREMS 


In this section, we return to the mean-value and VaR criteria and look at them from an- 
other point of view. We saw that they were not very sophisticated—especially the former, 
and do not satisfy all desirable properties—at least, the latter. Nevertheless, reasonable 
decision making rules may occur to be close to the criteria mentioned when the corre- 
sponding decision acts are made repeatedly. To understand this, we will proceed from the 
limit theorems of Probability Theory presented in Section 0.6. 


2.1 A simple model of insurance with many clients 


Consider an insurance company dealing with n clients. Let X;, i= 1,...,n, be the random 
value of the payment to the ith client. We assume the X’s to be independent and identically 
distributed (i.i.d.), which may be interpreted as if the clients come from a homogeneous 
group. We keep all notations from Section 0.6. 

Let m = E{X;}, and c = m + £, where £ > 0, be the premium for each client. Thus, we 
assume that the premium is, at least a bit, larger than m. The total profit of the company 
equals nc — Sn, where Sn = X, +... +Xp. Set Xn =S,/n. 

The probability that the company will not suffer a loss is equal to 


P(nc — Sn > 0) = P(S —mn < ne) = P(X,—m < £) > P(\X,—m| < £) 
= 1-—P(|X,—m|>e€)>1 as n>, 


for any arbitrarily small € > 0, by Corollary 0.10 to the LLN in Section 0.6. 

Thus, if n is “large”, for the company not to suffer a loss, the premium c should be “just a 
little bit” larger than m. In this case, the company would prefer, with regard to each client, 
the profit c — X; rather than zero. 

We see that for large n the necessary premium c may be close to the expected value m, and 
accordingly the criterion of the choice of a premium is close to the mean-value criterion. 

The “little bit” mentioned, that is, the value of £, is one of the main objects of study in 
Actuarial Modeling, and we will return to it repeatedly. Here we will make just preliminary 
observations. 

First, note that though € can be small, it cannot be zero. 

Indeed, let o? = Var{X;}. Assume © > 0; otherwise X’s take on just one value and the 


smtp Se E] 


oyn 


situation is trivial. As in Section 0.6, set Sž = 


Sn — mn 
P(nc—S, >0) =P — Sn >0)=P(S,— <0)=P <0 
(ne-5, > 0) = Plnn—S, > 0) = P(Sy -mn <0) =P (S <o) 


= P(S; <0) + ®(0) = 5 nee ees 


by the Central Limit Theorem (CLT) 0.11 in Section 0.6. So, in this case, the probability 
that the company will not suffer a loss is close only to 1/2. 


2. Limit Theorems 87 


Note also that since the limiting normal distribution is continuous, the probability that 
S;, equals exactly some value is close to zero. Therefore, the probability of making a profit 
and the probability of not suffering a loss asymptotically, for large n, are the same. 

Now, let € > 0. The same CLT provides the first heuristic approximation for a reasonable 
value of e. Assume that the company specifies the lowest acceptable level B for the proba- 
bility of not suffering a loss. For instance, the company wishes the mentioned probability 
to be not less than B = 0.95, in the worst case—to equal 0.95. 

Set € = ao/,/n, where the number a is what we want to estimate. Let c be an acceptable 
premium for the company. Then 


B < Pine = S, > 0) = P(S, -mn < ne) =P ( S < =) 
oyn (o 
= p (Sm <a) = P(S* < a). 


By the CLT, P(S% < a) œ% ®(a) for large n. Thus, up to normal approximation, B < ®(a), 
and hence for the premium to be acceptable, a should be not less than qgs, the B-quantile of 
the standard normal distribution. In the boundary case, the least acceptable premium 


qBsO 
Tae 
The sign ~ indicates that the answer is true within the accuracy of normal approximation. 
For B = 0.95, we have gg, = 1.64..., and c ~ m+ 1.646/ y/n. 


(2.1.1) 


cy m+ 


EXAMPLE 1. A special insurance pays b = $150 to passengers of an airline in the 
case of a serious flight delay. Assume that for each of 10,000 clients who bought such an 
insurance, the probability of a delay is p = 0.1. In this case, 


x, — b with probability p, 
‘~~ |0 with probability 1— p, 


m = bp = 15,6 = bą/ p(1 — p) = 45 (recall the formulas for the mean and the variance of 
a binomial r.v.). Then for B = 0.95, by (2.1.1), c ~ 15 + LOLAS z~ 15.74. So, a premium of 
$16 would be enough for the company. 


Note that the choice of c in (2.1.1) is closely related to the VaR criterion. For each 
premium c, the company compares its random profit nc — S, with the r.v. Y = 0, the profit 
in the case when the company does not sell the insurance product. For the c chosen, up 
to normal approximation, B = P(nc — Sn > 0) and hence P(nc — Sn <0) = 1 — B. Thus, 
zero is the (1 — B)-quantile for the r.v. nc — Sa. On the other hand, Y takes on only one 
value—zero, and this singular value is the y-quantile for any y, including y = 1 — B. (See 
again the definition of quantile in Section 0.1.3 and Fig.7e there.) 

Thus, for the least acceptable c in (2.1.1), (1 — B)-quantiles of the r.v.’s nc — S, and Y 
coincide, that is, nc — Sn is equivalent to Y in the sense of the VaR criterion. For c larger 
than the value in (2.1.1), nc — S,, will be better than Y = 0. 


The approach based on limit theorems is, however, far from being universal. First of all, 
the acts of making decisions are not always repeated a large number of times. We can say 


88 1. COMPARISON OF RANDOM VARIABLES 


so about an insurance company when it deals with a large number of clients, but a separate 
client may make decisions rarely enough, and the law of large numbers (LLN) in this case 
may not work well. 

Second—and this is also important—even when limit theorems formally might work, 
real people in real situations may proceed from preferences not connected with means or 
variances. 

For example, when comparing r.v.’s as in (1.2.1) even repeatedly, people rarely consider 
such r.v.’s equivalent. Usually, the less risky alternative (as Y in (1.2.1)) is preferred to the 
more risky (as X in (1.2.1)); see Section 3.4 for more detail. So, the LLN argument does 
not work here. 

The same concerns the CLT. Assume, for instance, that an individual proceeds—perhaps 
unconsciously—from the same argument based on the CLT, as we used above. Then adding 
a small amount £ of money to X from (1.2.1) would have made a difference: the r.v. X +€ 
would have been better than Y = 50. However, usually people—not companies but separate 
individuals—do not exhibit such behavior. 

We consider now an old celebrated example when the application of the LLN leads to a 
conclusion that is inconsistent with usual human behavior. 


2.2 St. Petersburg’s paradox 


The problem below was first investigated by Daniel Bernoulli in his paper [13] published 
in 1738 when D. Bernoulli worked in Saint Petersburg. Consider a game of chance con- 
sisting of tossing a regular coin until a head appears. Suppose that if the first head appears 
right away at the first toss, the payment equals 2, say, dollars if we update the problem to 
the present day. If the first head appears at the second toss, the payment equals 4, and so on; 
namely, if a head appears at the first time at the kth toss, the payment equals 2". It is easy to 
see that the payment in this case is a r.v. X taking values 2,4,8,...,2*,... with probabilities 
soppy pre Tespectively, and E{X } >27 | 4748: : fF... =14+1414... =o. 
By the LLN, this means that if the game is played repeatedly, and X; is the payment in the 
jth game, then with probability one 


Xı es +Xy 
n 


—> œ as NO, 


Thus, in the long run, the average payment will be greater than an arbitrary large number. 

Then if a player had proceeded from the LLN, she/he would have agreed to pay any, 
arbitrary large, entry price for participating in each play. Certainly, it does not reflect 
preferences of real people: most would not agree to pay each time, for example, $100 if 
even they are guaranteed to participate in a large number of plays. (Would the reader agree 
to pay $100 each time?) 


There exists a purely mathematical solution to this paradox based on the fact that in this 
Xy+...4+X, 


particular case, 
nlogyn 


— 1 with probability one. A not very short proof may be 


found, e.g., in [38], [120, p.57]. Thus, if the entry price for each play depends on the 


3. Expected Utility 89 


number of plays n and equals c = log, n, then for large n, the total payment for participating 
in n plays will be close to the total gain, and the price c would be “fair”. 

This solution is strongly connected with the particular problem under consideration. For- 
tunately, D. Bernoulli did not know the fact mentioned and suggested a general solution that 
had proved to be very fruitful and, in the twentieth century, had led to a developed theory 
we consider in the next section. 


3 EXPECTED UTILITY 
3.1 Expected utility maximization (EUM) 
3.1.1 Utility function 


D. Bernoulli proceeded from the simple observation that the “degree of satisfaction” of 
having capital, or in other words, the “utility of capital’, depends on the particular amount 
of capital in a nonlinear way. For example, if we give $1000 to a person with a wealth of 
$1,000,000, and the same $1000 to a person with zero capital, the former will feel much 
less satisfied than the latter. 

To model this phenomenon, D. Bernoulli assumed that the satisfaction of possessing a 
capital x, or the “utility” of x, may be measured by a function u(x) that, as a rule, is not 
linear. Such a function is called a utility function, or a utility of money function. The word 
“satisfaction” would possibly reflect the significance of the definition better, but the term 
“utility” has been already adopted. 

The utility function, if it exists, can be viewed as a characteristic of the individual, as if 
the individual is endowed by this function; so to speak, it is “built into the mind”. To some 
extent, we can talk about the utility function of a company too. In this case, it reflects the 
preferences of the company. 

D. Bernoulli himself suggested as a good candidate for the “natural” utility function 
u(x) = lnx, assuming that the increment of the utility is proportional not to the absolute 
but to the relative growth of the capital. More specifically, if capital x is increased by a 
small dx, then the increment of the utility, du(x), is proportional to dx/x, that is, 


du= es (3.1.1) 
x 
for a constant k. The solution to this equation is u(x) = klnx +C, where C is another 
constant. We will see soon that the values of k and C depend just on the choice of units in 
measuring utility, and hence do not matter. 
Consider now a random income X. In this case, the utility of the income is the r.v. u(X). 
Bernoulli’s suggestion was to proceed from the expected utility E{u(X)}. 


EXAMPLE 1. Assume that the utility function of the player in St. Petersburg’s paradox 
is u(x) = lnx. Then the expected utility 


oo 


E{u(X)} = L u(2")2-* = 3 in(2*)2-* = (ma) ¥ k2-* = 21n2, 
=1 =1 =! 


90 1. COMPARISON OF RANDOM VARIABLES 


and, unlike E{X}, the expected utility is finite. (To realize that Y7?_, k2~-* = 2, one may 
compute it directly, or observe that this is the expected value of the geometric r.v. with the 
parameter p = 1/2. See Section 0.3.1.3.) 


Next, we consider the general case. Clearly, we can restrict ourselves to non-decreasing 
utility functions, which reflects the rule “the larger, the better or at least not worse”. 


3.1.2 Expected utility maximization criterion 


By definition, this criterion corresponds to the preference order = for which 
X ZY SE{u(X)} > E{u(Y)} (3.1.2) 


for a utility function u. Not stating it each time explicitly, we will always assume that u(x) 
is defined on an interval (which may be the whole real line). 

The relation (3.1.2) means that among two r.v.’s, we prefer the r.v. with the larger ex- 
pected utility. In particular, 


if E{u(X)} = E{u(Y)}, we say that X >Y, 


X is equivalent to Y. 

If u(x) is non-decreasing (as we agreed), the rule (3.1.2) is monotone. (If X > Y with 
probability one, then u(X) > u(Y) with probability one too, which immediately implies 
that E{u(X)} > E{u(Y)}.) In Exercise 11, we discuss strict monotonicity. 

The investor who follows (3.1.2) is called an expected utility maximizer (EU maximizer; 
we will use also the same abbreviation EUM when it does not cause misunderstanding). 

It is worth emphasizing that when we are talking about an EU maximizer, we mean that 
the person’s preferences may be described by (3.1.2), or in other words that the person 
behaves as if she/he were an EU maximizer. However, this does not imply in any way that 
calculations in (3.1.2) are really running in the mind. A good image illustrating this was 
suggested in [82]. A thrown ball exhibits a trajectory described as the solution to a certain 
equation, but no one thinks that the ball “has and itself solves” this equation. People do not 
get confused about the ball but they sure do about models of other people. 


The first property of EUM criterion. The preference order (3.1.2) does not change 
if u(x)is replaced by any function u*(x) = bu(x) +a, where b is a positive and a is an 
arbitrary number. 


Indeed, if we replace in (3.1.2) u by u“, then b and a will cancel. 

Thus, u may be defined up to a linear transformation, and the scale in which we measure 
utility may be chosen at our convenience. In particular, there is nothing wrong or strange if 
u assumes negative values. 


EXAMPLE 1. Consider (3.1.1) with u(x) = klnx +C as above. We see now that con- 
stants k and C indeed do not matter, and we can restrict ourselves to u(x) = Inx. 


3. Expected Utility 91 


1 
EXAMPLE 2. Let u(x) = =a > 0; see Fig.1. Should the 
x 
fact that u(x) is negative for all x’s make us uncomfortable? Not 


at all. Consider u“ (x) = u(x) + 1 = Ta The new function is 
x 


positive but reflects the same preference order. The sign of u(x) 
does not matter; what matters when we compare X and Y is whether 


FIGURE 1. E{u(X)} is larger than E{u(Y)} or not. 
Consider an example of the comparison of r.v.’s. 
ex EXAMPLE 3. (a) A reckless gambler. Let a gambler’s util- 


ity function u(x) = e*. Negative x’s correspond to losses, and 

positive—to gains. The values of u(x) for large negative x’s prac- 

tically do not differ, while in the region of positive x’s the function 

u(x) grows very fast; see Fig.2. We may interpret it as if the gam- 
0 x bler is not concerned about possible losses and is highly enthusias- 
tic about large gains. Note that the function e* is convex, and as we 
will see later, in Section 3.4, the convexity of the utility function 
corresponds to the inclination to risk. 


FIGURE 2. 


Consider a game in which the gambler wins a dollars with a probability of p, and loses 
the same amount a with the probability q = 1 — p. So, we deal with X = +a with the 
mentioned probabilities. Assume p < q. 


In our case, E{u(X)} = e*p+e “q. The gambler will participate in such a game if X 
is better than a r.v. Y =0, which amounts to E{u(X)} > u(0) = 1. This is equivalent to 
e“p+e “q>1. 

If we set ef = y, the last inequality may be reduced to the quadratic inequality py? — 
y +q > 0. One root of the corresponding quadratic equality is one, the other is g/p. Since 
y > 1, the solution is y > g/p, and consequently, a > In(q/p). Thus, the gambler is inclined 
to bet large stakes, and will participate in the game only if a > In(qg/p). For instance, if 
p= i the lowest acceptable stake for the gambler is In3 = 1.1. 


(b) A cautious gambler. Consider now a gambler who views the loss of a unit of money 
as a disaster. What this unit of money is equal to, $1,000,000 or just $100, depends on the 
gambler. On the other hand, the gambler does not mind taking some risk and participating 
in a game with a moderate stake. The utility function of such a gambler may look as in 
Fig.3a: u(x) + —% as x + —1, and u(x) is growing as a convex function for positive x’s. 
For instance, the function 


oe kx? for x > 0, 
E In(1—x?) for —1<x<0 


has a similar graph. We wrote x? in In(1 — x?) to make the function smooth at zero. The 
parameter k indicates the gambler’s inclination to risk. The larger k, the steeper u(x) for 
positive x’s. 

Consider the same r.v. X as in the example above. Then E{u(X)} = p-ka? +q-In(1—a°). 
Denote the r.-h.s. by g(a). The gambler will participate in the game if E{u(X)} > u(0) =0, 
which amounts to g(a) > 0. The reader can readily verify that if k < q/p, the graph of g(a 


92 1. COMPARISON OF RANDOM VARIABLES 


(a) (b) (c) 


FIGURE 3. 


looks as in Fig.3b, and g(a) does not assume positive values. In this case, the person we 
consider will refuse to play. 
The graph for k > q/p is sketched in Fig.3c. In this case, g(a) is positive in a neighbor- 


hood of zero. The maximum of g(a) is attained at ag such that aĝ = 1 — = For example, 
P 


for p = i and k = 4, we get aọ = 5, so the gambler’s optimal behavior is to bet half of the 


unit of money. 


The next notion we introduce is a certainty equivalent. First, note that any number c may 
be viewed as a r.v. taking only one value c. Consider a preference order = not necessarily 
connected with expected utility maximization. Assume that for a r.v. X, we can find a 
number c = c(X) such that c ~ X with respect to the order =. That is, c is equivalent to 
X, or the decision maker is indifferent whether to choose c or X. It may be said that the 
decision maker considers c an “adequate price of X”. 

The number c(X) so defined is called a certainty equivalent of X. 

Now let us consider an EU maximizer with a utility function u. For such a person, in 
accordance with (3.1.2), the relation c ~ X is equivalent to E{u(X)} = E{u(c)}, and since 
c is not random, E{u(X)} =u(c). If u is a one-to-one function, there exists the inverse 
function u~! (y), and 


c(X) =u! (E{u(X)}). 


EXAMPLE 4. (a) In the situation of Example 3a, u`! (x) =Inx. So, the certainty 


equivalent c(X) = In(e“p+e~“q). For example, for p = } and a = 10, we would have 
c(X) = In(fe! + #e7!°) = 8.614, which is close to 10. It is not surprising: the gambler 


does not care much about losses. 


(b) Consider Example 3b for p = H, k = 4. Here the situation is quite different. The 
gambler bets a = ao = 4. In this case, E{u(X)} = g (4) = 4-4-4 +3- In(}) ~ 0.034. On 
the other hand, u~! (y) = \/y/k for positive y’s. So, in our case the certainty equivalent 


c(X) © ,/0.034/4 ~ 0.0922. 


3. Expected Utility 93 


ux) =x OT u(x) = —x 


(a) (b) 


FIGURE 4. Positive- and negative-power utility functions. 


Note that the certainty equivalent of a certain number a is, of course, this number: c(a) = 


u`! (E{u(a)}) =u !(u(a)) =a. 
3.1.3 Some “classical” examples of utility functions 


1. Positive-power functions. Let u(x) = x* for all x > 0 and some @ > 0; see Fig.4a. 
The expected utility in this case is considered only for positive r.v.’s, and E{u(X)} = 
E{X“}, the moment of X of the order a. If œ = 1, then E{u(X)} = E{X}, and the 
EUM criterion coincides with the mean-value criterion. For a < 1 the function u(x) 
is concave (downward), for œ > 1 - convex (concave upward). We will see soon 
that this is strongly connected with the attitude of the investor to risk. For u(x) we 
are considering, the certainty equivalent of a r.v. X is c(X) = (E{X%})!/“. In the 
simplest case & = 1, the certainty equivalent c(X) = E{X}. 


1/0 
1 
EXAMPLE 1. Let X =b > 0 or 0 with equal probabilities. Then c(X ) = (50") = 
27!/%b, The smaller o is, the smaller the certainty equivalent. We will interpret this 
fact later when we consider the notion of risk aversion. 
1 
aN te Ai 

EXAMPLE2. Let X be uniform on [0,5]. Then c(X)= 1 ada = (=) 


1/a 
1 
= (a) b. Because (1 +@)!/® is decreasing in &, again the smaller œ, the 


smaller the certainty equivalent. 


2. Negative-power functions. Next, consider u(x) = —1/x% for all x > 0 and some 
a > 0; see Fig.4b. We again deal only with positive r.v.’s, and E{u(X)} = —E{xX~“}. 
The fact that u(x) is negative does not matter, but the fact that u(x) + —%, as x > 0, 
is meaningful: now the investor is much more “afraid” of being ruined than in the 
previous case when u(x) — 0 as x — 0. We see also that u(x) — 0 as x + +œ, which 
may be interpreted as the saturation effect: the investor does not distinguish much 
large values of the capital. Compare it with the previous case of the positive power 
where u(x) — +% as x > +00, 


94 


1. COMPARISON OF RANDOM VARIABLES 


Both cases above may be described by the unified formula 


1 
u(x) = iy yÆ£l. (3.1.3) 


In the case y < 1, we have a positive power function (by the first property above, the 


absolute value of the multiplier i does not matter, only the sign does). For y > 1, 


we deal with a negative power function. 


. The logarithmic utility function, u(x) = lnx, x > 0, is in a sense intermediate between 


the two cases above and has been already discussed. 


. Quadratic utility functions. Consider u(x) = 2ax — x°, where parameter a > 0; the 


multiplier 2 is written for convenience. Certainly, such a utility function is meaning- 
ful only for x < a when the function is increasing. Hence, in this case, we consider 
only r.v.’s X such that P(X < a) = 1. Negative values of X are interpreted as the case 
when the investor loses or owes money. We have E{u(X)} = 2aE{X}— E{X?} = 
2aE{X}+(E{X})* —Var{X}. Thus, the expected utility is a quadratic function of 
the mean and the variance. 


. Exponential utility functions. Let u(x) = —e~®*, where parameter B > 0, and the 


function is considered for all x’s. The graph is depicted in Fig.5. Since u(x) > 
0, as x — œ, faster than any power function, the saturation effect in this case is 
stronger than in Case 2. The expected utility E{u(X)} = —E{e PX} = —m(-f), 
where M(z) = E{e*}. 

The function M(z), we also use the notation My (z) to emphasize that it depends on 
the choice of the r.v. X, is the moment generating function of X. (See a definition and 
examples in Section 0.4.) 


In Exercise 15, we show that the certainty equivalent 


(X) = -5 Mx (-B)) 


Consider a negative B, setting B = —a for some a > 0. Then 
1 1 aX 
c(X) = —In(My(a)) = ~- In(E {e*"}). (3.1.4) 
a a 


This is the Masset criterion popular in Economics. When X is a loss, the same 
expression appears as the premium for the coverage of X in accordance with the so 
called exponential principle. We consider it later in Section 2.4 and in Exercise 2.65. 
In particular, we compare there the cases B > 0 and B < 0. In Exercise 14 we consider 
two important properties of the Masset criterion. 


EXAMPLE 3. Let X be distributed exponentially with parameter a. Then Mx(z) = 


a/(a—z). (See Section 0.4.) Now calculations lead to c(X) 


= 
B 


[In(a +B) —Ina)]. 


3. Expected Utility 95 


FIGURE 5. The exponential utility function. 


In the case of exponential utility, EU maximization has an important property stated in 


Proposition 2 Let u(x) = —e~* and, under the EUM criterion with this utility function, 
X ZY. Thenw +X =w+yY for any number w. 


The number w above may be interpreted as the initial wealth, and X and Y as random 
incomes corresponding to two investment strategies. Proposition 2 claims that in the expo- 
nential utility case, the preference relation between X and Y does not depend on the initial 
wealth. 


Proof is straightforward. By definition, w+X Z% w +Y iff E{u(w+X)} > E{u(w+Y)}. 
For the particular u above, E{u(w+X)} = —E{eBlv+*)} — —e BE {e -PX }, and the same 
is true for Y. So, in the last inequality, the common multiplier —e} cancels out, and the 
validity of the relation E{u(w+X)} > E{u(w+Y)} does not depend on w. Hence, if this 
relation is true for w = 0, it is true for all w. Hf 


3.2 Utility and insurance 


Consider an individual with a wealth of w, facing a possible random loss €. Assume that 
the individual is an EU maximizer with a utility function u(x). What premium G would the 
individual be willing to pay to insure the risk? 

The individual’s wealth after paying the premium will become X = w—G, while if she/he 
does not buy the insurance, the wealth will equal the r.v. Y = w—&. 

Then in accordance with the principle (3.1.2), a premium G will be acceptable for the 
person under consideration only if 


u(w—G) > E{u(w—6&)}. (3.2.1) 
For the maximal accepted premium Gmax, 
u(w— Gmax) = E{u(w—§)}. (3.2.2) 
(Gmax exists if, say, u is continuous and increasing; we skip formalities.) 


EXAMPLE 1. Let u(x) = 2x— x°, w = 1, and let € be uniformly distributed on [0,1]. 
Because w—€ = 1 —€ < 1, we deal only with x’s for which u(x) increases. Let y = w — G. 


96 1. COMPARISON OF RANDOM VARIABLES 
By (3.2.1), 
2y—y" > 2E{(1—€)} —E{(1—&)°}. 
Observing that 1 — & is also uniformly distributed on [0,1] (show it!), we have 


fit. 
ia ae a 


We are interested in y < 1. As is easy to verify, for the last inequality to be true, we should 
have y > 1 — Z Hence, any acceptable premium G < T and Gmax = a z 0.57. 

In the example we consider, the loss is positive with probability one. In Exercise 17, we 
provide similar calculations for the case when € = 0 with probability 0.9, and € is uniformly 
distributed on [0,1] with probability 0.1. This corresponds to the typical situations when 
the loss equals zero with a large probability. Nevertheless, it is worth noting that situations 
when the loss is a positive (or practically positive) r.v. are not rare, especially when we deal 
with an aggregate loss concerning a large group of clients. For example, if a university 
provides medical insurance for its employees as one insurance contract, the total loss may 
be considered a positive r.v. The same remark concerns other examples in this section. 


Next, we consider not an insured but an insurer. The latter offers the complete coverage 
of a loss € for a premium H which, in general, may be different from G above. Assume that 
the insurer is an EU maximizer with a utility function u; (x) and a wealth of w1. (Actually, 
it is more natural to interpret w; as an additional reserve kept by the insurer to fulfill its 
obligations.) Following a similar logic, we obtain that an acceptable premium H for the 
insurer must satisfy the inequality 


ui (W1) < E{u(wi +H —6&)}, (3.2.3) 
and hence for the minimal accepted premium Ain 
ui(w1) = Ef{ui(wi + Hmin —&)}. (3.2.4) 


EXAMPLE 2. Let w(x) =x%, wı = 1, and € be the same as in Example 1. Taking again 
into account that n = 1 —& is uniformly distributed on [0,1], we derive from (3.2.3) that 


[(H es HN, 


1 
1< E{(H+1-%)°} =E{(H+0)"} = | (H+a)"ax= — 
0 
Hence, Hmin is a solution to the equation 


For example, when a = 1/2, it is easy to calculate—using even a simple calculator— that 
Amin © 0.52. 


Clearly, for a premium P to be acceptable for both sides, the insurer and the insured, we 
should have 
Amin < P < Gmax- 


3. Expected Utility 97 


Hence, if Hmin > Gmax, insurance is impossible. If Hmin < Gmax, the premium will be 
chosen from the interval [Hmin, Gmax]. For instance, in the situation of Examples 1-2, we 
have 0.52 < P < 0.57. 

If for example, the insurer has a sort of monopoly in the market, the premium will be 
close to Gmax. In the case of competition or if a law imposes restrictions on the size of 


premiums, we can expect the premium to be closer to Hmin. 


It is worth emphasizing that in general and in the examples above, premiums depend 
on the initial wealth w. Now we consider the special case of exponential utility, when 
premiums do not depend on wealth. 

Let u(x) = —e~8*. Due to Proposition 2, we can set w = 0 in (3.2.1)-(3.2.4). Hence, 


1 
(3.2.2) is equivalent to —e8Gmx = —E {eP}, and Gmax = p rE). We see in the r.-h.s. 
the moment generating function (m.g.f.) Mg (z) = E {e%}. Thus, 
1 
p 


The same formula is true for the insurer: in a similar way, we derive from (3.2.4) that in the 
case u(x) = —e7P*, 


Gmax = = In Mg (P). (3.2.5) 


1 
H min = Bi ln Mg (B1). (3.2.6) 

It may be proved (see Exercise 33 and an advice there for detail) that the r.-h.s. of (3.2.5) 
is non-decreasing in B. Consequently, for Hmin < Gmax, we should require Bı < B. This 
fact will be interpreted when we consider the notion of risk aversion. 


EXAMPLE 3. Assume that the random loss & may be well approximated by a normal 
r.v. with mean m and variance 67. In this case, the m.g.f. M(z) = exp{mz+07z7/2}; see 
Section 0.4.3. The reader is invited to provide simple calculations leading in this case to 


o2 o2 
Gmax smtp Amin =m+ iz- (3.2.7) 


The answer looks nice and natural: the larger B and/or the variance, the more the premium 
exceeds the expected value of the loss. For Hmin < Gmax, we indeed should have B; < B. 


The same logic can be applied to more complicated forms of insurance. Let, for example, 
a client be willing to insure only half of a possible loss €. Then the corresponding equation 
for the maximal premium will be 


utes a) Set elk, (3.2.8) 


In conclusion, it is worth noting that the expected utility analysis can work well when we 
deal rather with the preferences of individual clients. This does not mean that we cannot 
apply the EUM criterion to the description of the behavior of companies, but one should 
do it with caution. As we will see in later chapters, the behavior of companies may be 
determined by principles qualitatively different from those based on expected utility. 


98 I. COMPARISON OF RANDOM VARIABLES 


3.3 How to determine the utility function in particular cases 


In Section 3.5.5, we will see that in the EUM case, when one considers r.v.’s taking only 
n fixed values, to completely determine the preference order, it suffices to specify n — 1 
equivalent distributions. At least theoretically it may be done by questioning the individual. 

Another way is to determine certainty equivalents, which may be illustrated by the 
following 


EXAMPLE 1. We believe that Chris is an EU maximizer, and we try to determine his 
utility function u(x). In view of the first property from Section 3.1.2, the scale in measuring 
utility does not matter, so we can set, say, u(0) = 0 and u(100) = 1, where money is 
measured in convenient units (for example, not in $1 but in $100). 

You invite Chris to compare a game with prizes X = 100 or 0, each with probability 1/2, 
with a payment of 50 for sure. Chris finds X worse than a certain payment of 50. Then you 
reduce 50 to 49,48,..., and so on, up to the moment when Chris starts to hesitate. Assume 
that it happens at c = 40. Then we can view c as the certainty equivalent of X. This means 
that u(c) = E{u(X)} = 5u(100) + 5u(0) = 5. Hence u(40) = 0.5, and we know the value 
of u(x) at one more point. You can continue such a process, for example, figuring out how 
much Chris values a r.v. X; = 100 and 40 with equal probabilities. Assume that Chris’s 
answer is 60. Then u(60) = 5u(100) + 5u(40) = 3, etc. 

Similar questioning may involve insurance premiums. Suppose, for example, that Chris’s 
initial wealth is 100 units of money. (To make an example meaningful we should certainly 
assume that the units are substantially larger than $1.) Assume that, when facing a possible 
loss of the whole wealth with a probability of 0.1, Chris is willing to pay a premium of at 


(0) + 2 4100) = 


most 25 to insure the loss. In view of (3.2.2), it means that u(75) = 10 


9/10. 


10" 


Unfortunately, in real life it works not so well as in nice theoretical examples. The prob- 
lem is not in mathematical modeling but in making results of such an inquiry reliable, 
reflecting the real preferences of the individual. This is a psychological rather than math- 
ematical question. The difficulty is that answers depend on the situation, on the form in 
which the questions are asked, whether the questioning involves real money or the exper- 
iment is virtual, and on many other psychological and social issues. These problems are 
beyond the scope of this book on mathematical modeling. For a corresponding discussion 
see, e.g., [56], [72], [85], [144], [147], and references therein. 


3.4 Risk aversion 
3.4.1 A definition 
Below, by the symbol Zę we will denote a r.v. 
Z= € with probability 1/2, 
€ \—e with probability 1/2, 


where € > 0. We will talk about the risk aversion of an individual with a preference order 
> if the following condition holds. 


3. Expected Utility 99 


Condition Z: For any r.v. X, any € > 0, and any r.v. Zę independent of X, it is true that 
XZ X+. 


Condition Z reflects the rule “the less stable, the worse”. An investor with preferences 
satisfying this property would not accept an offer resulting in either an additional income 
with probability 1/2 or a loss of the same amount and with the same probability. 

It is important to emphasize that Condition Z concerns an arbitrary preference order, not 
only the EUM criterion. 

An individual whose preference order satisfies Condition Z is called a risk averter. If 
X X X +Z for any X, any € > 0, and any Zę independent of X, then we call such an 
individual a risk lover or risk taker. 

The fact that we consider in Condition Z a non-strict relation = is not essential. We do 
it to avoid below some superfluous constructions. Formally, the above definition does not 
exclude the case when an individual is simultaneously a risk averter and a risk lover, that 
is, X ~ X + Ze for all X and e. In this case, we say that the individual is risk neutral. 

Certainly, a person may be neither a risk averter nor a risk lover. For example, it may 
happen that for some particular X and e, it is true that X = X + Ze, and for another r.v., say, 
X*, it may turn out that X*  X* + Ze. 


Next, we consider the EUM criterion and figure out when this particular criterion satis- 
fies Condition Z. 


Proposition 3 Let = be a EUM order defined in (3.1.2). Then Condition Z holds iff 
u(x) is concave. 


We will prove this proposition at the end of this section; now we turn to examples and 
comments. 

Usually we deal with smooth utility functions, so to check whether an EU maximizer 
with a utility function u is a risk averter, it suffices to check the second derivative u”. 

For example, for u = x%, we have u(x) = a(a—1)x%-?. Thus, u”(x) < 0 for a < 1, 
which corresponds to the risk aversion case, while for œ > 1 we deal with a risk lover. 
The case & = 1 when E{u(X)} = E{X} may be assigned to both types: the person is risk 
neutral. Other utility functions are considered in Exercise 26. 

Whether a person is a risk averter or a risk lover (or neither) depends, of course, not only 
on her/his personality but on the particular situation. You may be a risk averter in routine 
life but if you have decided to spend some time in a casino, you are definitely a risk lover. 

There is also strong evidence based on experiments that u(x) 
many people incline to behave as risk averters when con- 
cerned with future gains (positive values of X), and as risk 
lovers when facing losses. 0 

For example, a person may choose $500 for sure rather x 
than $1,000 with probability 1/2. However, the same person 
may prefer to take a risk of losing $1,000 with probability 
1/2 rather than to lose (only) $500 for sure. A utility function 
in this case may look as in Fig.6. FIGURE 6. 


100 1. COMPARISON OF RANDOM VARIABLES 


Certainly, the utility function may be more complicated or—better to say—more sophis- 
ticated. For example, in the region of moderate x’s the function may be concave and in the 
region of large income values—convex. 


The following inequality clarifies why the concavity of utility functions is relevant to risk 
aversion. 
3.4.2 Jensen’s inequality 

We assume all expectations below to be finite. 


Proposition 4 Let a rv. X take on values from a finite or infinite interval I, and let a 
function u(x) be concave on I. Then 


E{u(X)} < u(E{Xx}). (3.4.1) 
If u is convex (concave upward), then 
E{u(X)} > u(E{X}). (3.4.2) 


The proof is relegated to Section 3.4.4. 

Being purely mathematical assertions, inequalities (3.4.1)-(3.4.2) are relevant to the basic 
question of insurance: why is it possible? 

Assume that a client of an insurance organization is a EU maximizer and consider re- 
lation (3.2.2). If the client is a risk averter (which is natural to assume since the client is 
willing to pay to insure a risk), then u(x) is concave, and by Jensen’s inequality, 


u(w — Gmax) = E{u(w—§)} < u(E {w —§}) =u(w—E {§}). 
Since u is non-decreasing, it implies that w — Gmax < w—E{6&}, or 


Gmax 2 E{6}. 


Thus, the maximum premium the client agrees to pay is larger than (or, for the boundary 
case, equals) the average coverage of the risk, E{€}. 

So, the company will get on the average more than it will pay, which means that the 
company can function. 

To the contrary, if the client had been a risk lover, from Jensen’s inequality it would have 
followed that Gmax < E{&}, and insurance would have been impossible. 


EXAMPLE 1. Consider Example 3.2-1. We computed Gmax=q~0.57, while E{E}=0.5. 


The same argument may be applied to the certainty equivalent of a r.v. X (see Section 
3.1.1). The inverse of an increasing function is increasing. From this and (3.4.1) it follows 
that if u is concave, then c(X) = u~! (E{u(X)}) < u`! (u(E{X})) = E{X}. Thus, 


In the case of risk aversion, c(X) < E{X}. 


3. Expected Utility 101 


For the risk lover a similar argument leads to c(X) > E{X}. 


EXAMPLE 2. Let X be exponential with parameter a, and u(x) = —e~®*, B > 0. The 
function is concave, and the person with such a utility function is a risk averter. Continuing 
the calculations from Example 3.1.3-3, we have 


o(X) = glln(a-+B) —Ina)] = -g (Ina) í mete] = 5 (in?) í EA 


1 
Let X be “large”; formally let a > 0. Then E {X } = — — æ. Since the third factor in (3.4.3) 
a 


1 1 
converges to one, c(X) ~ = In G) . (The relation u ~ v means me 1.) Thus, in our case 
a v 


B 
ia 5 In(E{X)). 


Since Inx is much smaller than x for large x’s, the certainty equivalent is much smaller than 
the mean value E{X}. We interpret this as saying that the individual is a strong risk averter. 


Route 1 = page 129 


3.4.3 How to measure risk aversion in the EUM case 


We use below the Calculus notation o(€) for a function o(€) such that a8) _9 as € > 0. 
The reader who is not familiar with this very convenient and simple notation is recom- 
mended to look at the detailed explanation in Appendix, Section 4.1. 

Let x > 0 be a fixed capital. Its certainty equivalent is the same number x. Let us consider 
the r.v. x+ Ze and compute its certainty equivalent for small €. 


Lemma 5 Suppose the second derivative u” exists and is continuous, and u' (x) > 0 for 
x chosen above. Then the certainty equivalent 


epee TROE +o(e2), (3.4.4) 


where 


R@)=-—=. (3.4.5) 


The proof is given in Section 3.4.4; now we will discuss the significance of (3.4.4). 

By definition of o(£?), we view the third term in (3.4.4) as negligible with respect to the 
second. 

The function R(x) may be considered a characteristic of the concavity of u. In particular, 
if u is concave (risk aversion!), then R(x) > 0. 

If u is concave, by Proposition 4, the r.v. 


x+Ze Xx, 


102 I. COMPARISON OF RANDOM VARIABLES 
and for the corresponding certainty equivalents we have 
c(x+Z_) <x. 


The difference x — c(x + Ze) may be viewed as a “price for risk”, a “measure of riskiness”. 
By Lemma 5, this measure is proportional to R(x) up to the negligible remainder o(£?). The 
characteristic R(x) is called an absolute risk aversion function, or the Arrow-Pratt index of 
risk aversion. 
For a r.v. X, the expectation E{R(X)} may be called an expected absolute risk aversion. 
If x is measured in dollars, the dimension of R(x) is dollar~'. We define the relative 
risk aversion function as R,(x) = |x|R(x). This function does not have dimension. We call 


E{R,(X)} an expected relative risk aversion. 


EXAMPLE 1. Let u(x) = —e—**_ Then, as is easy to calculate by substituting into 
(3.4.5), the absolute risk aversion function R(x) = B and does not depend on x. In light of 
this, formulas (3.2.7) from Example 3.2-3 look very nice and understandable: the larger 
the risk aversion characteristics B and B1, the more the premiums Gmax and Hmin exceed 
the expected value m. The differences Gmax — m and Hin — m are proportional to B and 
Bi, respectively. The insurance is possible if the insurer is not more risk averse than the 
insured (Bı < B). This reflects reality. The insurance company can afford to be less risk 
averse. It deals with many clients, the number of those is usually essentially larger than 
the number of claims, and payments are compensated by premiums. The corresponding 
rigorous model will be considered in Section 2.3. 


EXAMPLE 2. Let u(x) =x%, x> 0, a >0. Then R(x) = (1 — &)/x (compute on your 
own), and the relative aversion R,(x) = 1 — a. It is non-negative iff u is concave (Q < 1). 

For u(x) = —x"“, x > 0, & > 0, we have R(x) = (1+ @)/x, and the relative aversion 
R,(x) = 1+ and is positive for all œ > 0 (which is consistent with the fact that u is 
concave for all œ). 

In Examples 3.1.3-1 and 2, we considered two types of a particular r.v. X. In the first 
example, X took on two values, b>0 and 0, with equal probabilities; in the second, X 
was uniform on [0,5]. The certainty equivalents proved to be c(X) = 2~!/“b and c(X) = 
(1+ a) *b, respectively. In both cases, the less the risk aversion a, the less the certainty 
equivalent. 


EXAMPLE 3. For u(x) = ln(x), x > 0, we have R(x) = 1/x, and R,(x) = 1. 


In Exercise 37, the reader is invited to prove that the cases considered above exhaust all 
cases with constant risk aversion functions. 


EXAMPLE 4. Let u(x) =x/(1+ |x|). The graph is given in Fig.7a. We see that the 
person with such a utility function is a risk averter in the region of gains and a risk lover 
in the case of losses (negative x’s). We observe also the saturation effect in both sides for 
x —> +æ. So, we expect that at +% the absolute aversion should vanish. It is true since, as 
is easy to compute, R(x) = 2/(1 + |x|) for x > 0, and R(x) = —2/(1+ |x|) for x < 0. See 
Fig.7b. 

Since u’(0) does not exist, R(0) is not defined. However, we can write that R,(x) = 
|x|R(x) = 2x/(1+ |x|) for x 40. Then, by continuity, we may set R,(0) = 0. 


3. Expected Utility 103 


FIGURE 7. 


3.4.4 Proofs 


To make constructions below simpler, we assume additionally that functions u(x) are 
continuous. Since we deal with concave functions which are continuous on any open inter- 
val, it is a very minor assumption. 


Proof of Proposition 3. Sufficiency. Note that if u(x) is concave, then setting A = 5 in 
the definition (0.4.3.1), we get that for any x1,x2 from the domain of u(x), 


Wen) Fue) <i (2 =) (3.4.6) 


Now, let Fy(x) and Fz, (x) be the d.f. of X and Ze, respectively. Since X and Zę are 
independent, 


E{u(X +Ze)} = f f_wetodresarz() = | 


—oo 


( [ust 2tdFn(@) ) afc 
i 1 1 1 1 
= iF (ute); +u(e—e)5) dF, (x) =E{ ux +e) 2 TEN 
By virtue of (3.4.6), the r.v. in {.} above is not greater than u(X). Hence, 
E{u(X +Ze)} < E{u(X)}, 


and, consequently, X + Ze < X. 


(In the proof above we avoided conditional expectations (Section 0.7). Otherwise, we 
could just condition on X, writing E{u(X +Ze)} = E{E{u(X +Ze)|X}} = E{}u(X +e) + 
1 
zu(X —£)}.) 


; xı +X2 X2— Xx] 
Necessity. Let xı,x2 be two numbers such that xı < x2. Set x9 = E= 7 


and X = xo. By definition of risk aversion, E{u(X +Ze)} < E{u(X)}. For the particular X 
above, it means that Su(xo +e) + Su(xo €) < u(x). By the choice of xo and e, this implies 
(3.4.6). The last property is called midconcavity, and formally it does not imply concavity, 
that is, (0.4.3.1) for all A € [0,1]. However, it does if u(x) is continuous (see, e.g., [55]), 
which we have assumed. This proves necessity. 


104 1. COMPARISON OF RANDOM VARIABLES 


As a matter of fact we do not have to assume continuity. The utility function u is non-decreasing 
and hence it is bounded, either from above or from below, on any interval where it is defined. It is 
known that in this case midconcavity implies continuity on each open interval. This fact is traced to 
Jensen’s paper [65]; see also, e.g., [55]. I 


Proof of Proposition 4. Let u be concave on an interval J, and let X take values from 
I. Set m= E{X}. If X takes only one value corresponding to one of the endpoints of 7, 
Jensen’s inequality becomes an equality, and proof is trivial. Assume that it is not so. Then 
m is an interior point of J, and by Proposition 0-1, there exists a number c such that 


u(x)—u(m) <c(x—m) for any x € J. (3.4.7) 


(We can include the endpoints since u is assumed to be continuous. Note that as a matter 
of fact this assumption is not necessary, and we made it for simplicity.) 

Setting x = X, we write u (X) —u(m) < c(X —m). Computing the expectations of both 
sides, we have E{u(X)}—u(m) < c(m—m) =0, which amounts to (3.4.1). 

To prove (3.4.2), it suffices to consider the function —u(x) which is concave if u(x) is 
convex. W 


Proof of Lemma 5. By Taylor’s expansion, u(x + Ze) = u(x) + u (x)Ze + 5ul" (x)Zz + 
o(Zz). Note also that E{Z,} = 0, E{Zz} =e. Hence, E{u(x+Ze)} = u(x) + Su" (x)e* + 
o(€?). 

The function u’ is continuous, and u'(x) > 0. Hence, u'(y) > 0 for y’s from a neigh- 
borhood of x. Then we can consider u~!(y) and apply the Taylor expansion. Since, 


(u-"(y))! = 1/u' (u~! (y)), we have u~! (y +s) = u(y) + (w'(9))/s + 0(s) = u(y) + 
u'(u~'(y)) 


c(x+Ze) =u (E{u(x+Ze)}) =a (us) + 1 @e +o) 


s+o(s) for small s. Eventually, for small £, the certainty equivalent 


2 


ee —— (roe role) +0 (oe +o) 
1u”(x) 2 


SAE ME) WE) 
| 


1 
o(€?) +0 (oe +o) ; 
The last two terms are o (£?) 


3.5 A new perspective: EUM as a linear criterion 
3.5.1 Preferences on distributions 


Let us recall the original Probability Theory framework with a sample space Q = {@} 
and a probability measure P(A) on it. Random variables X = X (œ) are functions on Q. 

Let Fy(B) = P(X € B), where B is a set from the real line. As in Section 0.1.3.1, we call 
the function of sets Fx (B) the distribution of X. In particular, the d.f. Fy (x) = F ( (—%,x]). 
Certainly, if we know Fx (B) for all B, we know Fx (x) for all x. The converse assertion is 
also true: knowing the d.f. F(x), we can determine Fy (B) for any set B. (See (0.2.1.6) and 
explanations on this point in Sections 0.1.3.1, 0.1.3.3.) 


3. Expected Utility 105 


We fix a class X = {X} of r.v.’s X and consider a preference order = on X. 

Suppose that, when comparing r.v.’s, we take into account not the structure of r.v.’s (that 
is, how they depend on œ) but only the information about the possible values of r.v.’s and 
the corresponding probabilities. In other words, we proceed only from distributions Fy and 
compare not r.v.’s themselves but rather their distributions. For example, this is the case in 
the EUM framework. Indeed, E{u(X)} is completely determined by the distribution Fy, 
and therefore, when comparing r.v.’s, as a matter of fact, we compare the corresponding 
distributions. 

In the situation we have described, instead of the preference order on the class X = {X}, 
it suffices to consider a preference order = on the set F = {Fy} of the distributions of the 
rv.’sX from X. This order is determined by the rule 


Because any distribution and its d.f. completely determine each other, it does not matter 
where we define the preference rule =: among distributions or distribution functions. Be- 
low, when it does not cause misunderstanding, we use the symbol F for both. The reader 
can even understand the word “distribution” as “distribution function”. 

Conversely, if we agreed to compare only distributions, and if we defined somehow a 
preference order on a class F = {F } of distributions F, then we have defined, by the same 
relation (3.5.1), the preference order among all r.v.’s having the distributions from F. 


3.5.2 The first stochastic dominance 


Next, we specify in terms of distributions the rule “the larger, the better”. In Section 1.1, 
we introduced the natural monotonicity property: if P(X > Y) = 1, then X = Y. However, 
if we proceed only from distributions, such a rule does not cover all situations when X is 
“obviously better” than Y. 


EXAMPLE 1. Let Q consist of two outcomes, œ; and @2, and P(@,)=1/3, P(@2)=2/3. 
Let 
X (@1) = 10, X (@2) = 20, 
Y (1) = 20, Y (@2) = 10. 


For example, we may view X and Y as the prices for two stocks and œ; , @2 as two possible 
states of the future financial market. With a positive probability of 1/3 the value of X 
will be less than the value of Y. Nevertheless, because P(œ1) < P(@2), if for us it is not 
important which œ will occur, but merely what income we will have, we will certainly 
prefer X to Y. 


The purpose of the next definition is to take such cases into account. 
We say that the distribution F dominates the distribution G in the sense of the first 
stochastic dominance (FSD) if 


F(x) < G(x) for any x. (3.5.2) 


(That is, whatever x is, for the distribution F, the probability of having an income not larger 
than x, is not larger than that for the distribution G.) 


106 1. COMPARISON OF RANDOM VARIABLES 


FIGURE 8. 


Certainly, if P(X > Y) = 1, then Fy (x) < Fy(x) for any x. Indeed, Fy (x) = P(X <x) < 
P(Y <x) = Fy(x). The converse assertion is not true. 


EXAMPLE 2. Let us revisit Example 1. Let F(x) and G(x) be the d.f.’s of X and Y, 
respectively. Their graphs are given in Fig.8. We see that (3.5.2) is true, though P(X < 
Y= 3 50, 


In the case of the comparison of distributions, the rule “the larger, the better” is reflected 
in the following definition. 


A preference order = on a set F = {F} is said to be monotone 
with respect to the first stochastic dominance (FSD), 
if F = G for any pair of distributions F,G € F with property (3.5.2). 


For brevity, in this case we will also say that = satisfies the FSD rule. 

A preference order = is said to be strictly monotone with respect to the FSD, if F > G 
(i.e., F is better than G), once (3.5.2) is true and at least for one x the inequality in (3.5.2) 
is strict. 


EXAMPLE 3. Consider the situation of Example 2. If a preference order = satisfies 
the FSD rule, then for the distributions Fy and Fy from this example, we have Fy = Gy. 
Moreover, if = is strictly monotone, then F > G. 

EXAMPLE 4. Ann’s future income X has the exponential distribution with a mean of 
m > 0, and Paul’s income Y is uniformly distributed on [0, 1]. For which m is Ann’s position 
better in the sense of the FSD rule? 

Let F(x) and G(x) be the respective d.f.’s. We should figure out when (3.5.2) is true. We 
have F (x) = 1 — e™*/” for x > 0, and G(x) =x for x € [0, 1]. Furthermore, F (0) = G(0) =0, 
F(1) < 1, and G(1) = 1; see Fig.9. 

Since F (x) is concave, it may coincide with G(x) at no more than one point besides the 
origin. The derivative F’(x) = e~*/""/m, and G' (x) = 

Hence, if m > 1, then F’(0) < G' (0), and F’(x) < 6 ) < 1. From this it follows that 
F(x) < G(x) for all x > 0; see Fig.9ab. 

If m < 1, then F’(0) > 1, and F(x) > G(x) for x’s from some interval; see Fig.9c. 


3. Expected Utility 107 


(b) m>1 


FIGURE 9. 


Thus, if we follow the natural FSD rule, for m > 1 we prefer X to Y. For m < 1, the 
comparison is not so obvious, and to make a decision, we should determine the preference 
order in more detail. 


3.5.3 The second stochastic dominance 


This section concerns a property reflecting the rule “the riskier, the worse”. We say that 
F dominates G in the sense of the second stochastic dominance (SSD) if 


f E ett (3.5.3) 


provided that the integral above is finite for any t. Clearly, if F dominates G in the sense of 
the FSD, that is, (3.5.2) is true, then (3.5.3) is also true. (Since in this case, the integrand in 
(3.5.3) is non-positive.) 


EXAMPLE 1. To clarify (3.5.3), let us recall the definition of risk aversion from Section 
3.4.1. Let a r.v. X have a distribution F, and let Gg be the distribution of the r.v. Xe = X + Ze, 
as it was defined in Section 3.4.1. Then 


1 1 
1 1 
= 7PX +e <x)+P(X—e<x))= zF x-8) +F(x+e)). (3.5.4) 
The distribution F dominates Gg in the sense of the SSD. Indeed, set Q(t) = f! „F (x)dx. 
Note that f" „F (x+e)dx= [TE F(x)dx = Q(t +£), and similarly f! „F (x—e)dx = Q(t—e). 
Then, in view of (3.5.4), 


[0 ~Gela)dx= 01) - 5 (Olt 2) + 0l +8)). 


To show that the last expression is non-positive for any t, it remains to prove that Q(t) is 
a convex function; see Appendix, Section 4.3. Assume, for simplicity, that F(x) has the 
density f(x) = F’(x). Then Q'(t) = F(t), Q(t) = f(t) > 0, since f is a density. As a 
matter of fact, the smoothness of f(x) is not necessary: it suffices to take into account that 
F (x) is not decreasing. 


108 1. COMPARISON OF RANDOM VARIABLES 


A preference order = on a set F = {F } is said to be monotone with respect to the SSD, 
if F = G for any pair of distributions F,G € F with property (3.5.3). 


This is a stronger requirement on the order = than the monotonicity with respect to the 
FSD: if = is monotone with respect to the SSD, then it is monotone with respect to the 
FSD. (Not vice versa! Let us double check the logic of the implication. Assume that % is 
monotone with respect to the SSD. Let F dominate G in the sense of the FSD. Then (3.5.3) 
is true. Then F = G. Hence, = is monotone with respect to the FSD also.) 

We say that an individual is a risk averter in the sense of the SSD if her/his preference 
order = is monotone with respect to the SSD. 

From Example 1 it follows that if somebody is a risk averter in the sense of the SSD, 
then she/he is a risk averter in the sense of Section 3.4.1. 


3.5.4 The EUM criterion 


We return to the EUM criterion. For a r.v. X, the expected value E {u(X )} may be written 
as 


E{u(X)} = ‘A “uQa)dF (x), (3.5.5) 


where F(x) is the distribution function of X. 

In more detail, this formula is discussed in Section 0.1.3. In particular, if X has a prob- 
ability density function f(x), then dF (x) = f(x)dx, and the integral above may be under- 
stood in the “usual” way as f7 u(x) f(x)dx. 

If X is discrete and assumes values x1,x2,... with probabilities fi, fo,..., respectively, 
then we set dF (x;) = fj for all j, and dF(x) = 0 for all other x’s. This will lead to 
Jo. u(x)dF (x) = Lj u(x) fj, that is, to the definition of expected value in the discrete case. 

As was already mentioned, since E{u(X)} is completely determined by the d.f. F of X, 
we may restrict ourselves to the corresponding order = on a set F = {F } of distributions. 

Let us fix a utility function u and set 


U(F) = i “uGs)dF (x), (3.5.6) 


assuming that the last integral is finite for all F € F. 

We will call U (F) a utility functional. The word functional is used in Mathematics when 
the argument in a function (as in U (F) ) is not a number but an object of a more general 
nature (as a distribution F) but the values of the function are real numbers. It is convenient 
to use this common mathematical term here in order to distinguish the utility functional 
U(F) from the utility function u(x). 

The EUM preference order = in F may be defined as 


F ZG & U(F) > U(G), 


that is, = is preserved by U. 


3. Expected Utility 109 


Certainly, the criterion we have defined is the same criterion as above, just presented in 
terms of distributions. 


Consider the difference U (F) — U (G). Integrating by parts, we have 


oo oo 


U(F)—U(G) = f u(x)d(F (x) -G(x)) = 1 (G(x) — F(x))du(x). (3.5.7) 


—oo —oo 


(The differences F(cc) — G(œ) = 1 — 1 = 0, F(—œ) — G(—æ) = 0 — 0 = 0; the limits 
u(x)(F (x) — G(x)) at +œ equal zero; see, e.g., (0.2.5.5) and the argumentation there.) 


Let F dominate G in the sense of the FSD, and the utility function u(x) be non-decreasing. 
Then G(x) — F(x) > 0, du(x) > 0, and (3.5.7) implies that U (F) > U(G). Thus, we can 
state the following: 


If u(x) is non-decreasing, then U (F) is monotone with respect to the FSD. 


The above condition is also necessary. Indeed, let x; > x2, a r.v. Xı assume just one 
value xı, and a r.v. X2 assume only one value x2. Let F and G be the d.f.’s of X; and 
X2, respectively. Then F dominates G in the sense of the FSD (show why). If U(F) is 
monotone with respect to the FSD, than U(F) > U(G). On the other hand, in the EUM 
case, U (F) = E{u(X1)} = u(x), and U (G) = u(x2). So, u(x1) > u(x2). 

> Consider now risk aversion. Suppose that F dominates G in the sense of the SSD and 
the utility function u(x) is concave. For simplicity, assume that u is sufficiently smooth. 
Integrating (3.5.7) by parts one time more, we have 


U(F)-U(G) = I (Gh) -pu dE f 2 


— —oo 


aya) ( f * (G(s) - F(s)\ds) + f ° ( f y= as) iia 


A—> œ —oo —oo —oo 


> T i ( f ” (F(s) -G())as) ul" (x)dx. (3.5.8) 


(We took into account that w’(A) > 0 and f. $ above are non-negative.) Furthermore, because 
f= (F(s)—G(s))ds < 0 and u” (x) < 0, we have U (F) > U (G). Thus, 


If u(x) is concave and non-decreasing, then U (F) is monotone with respect to the SSD. 


It may be proved that in the case of EUM, the definitions of risk aversion in the sense 
of Condition Z from Section 3.4.1 and in the sense of the SSD are equivalent. In view of 
Proposition 3, from this it follows, in particular, that the above concavity condition is also 
necessary. < 


110 1. COMPARISON OF RANDOM VARIABLES 


3.5.5 Linearity of the utility functional 


Let F; and F> be two distributions, and let a number a € [0,1]. We call the distribution 
Fœ a mixture of distributions F, , Fy if 


FO —aF,+(1-o)h, (3.5.9) 


that is, the d.f. F( (x) = aF; (x) + (1 — )F)(x), and in general, F(® (B) = aF\(B) + (1 — 
O) Fo (B) : 

The reader is recommended to look up this notion and comments on it in Section 0.1.3.5. 
In particular, we mentioned there that the r.v. X having the distribution F (a) may be repre- 
sented as 

x= p with probability a, 
X2 with probability 1 — a, 


where X1, X2 have the distributions F1, F2, respectively. For a particular example, see also 
Exercise 41. 
Now, let us consider the concept of mixture from a geometric point of view. 


EXAMPLE 1. Consider r.v.’s taking only three fixed values, x,,x2, and x3, such that 
xı <x2 < x3. Any distribution F of such a r.v. may be identified with the probability 
vector (p1,pP2,P3), Where p; is the probability of the value x;. Since pı + p2 + p3 = 1, 
to specify the distribution, it is enough to know two probabilities from these three. For 
us, it will be convenient to choose (p1, p3). All points (p1, p3) lie in the triangular region 
A= {(p1, p3) : P1, p3 = 0, pi + p3 < 1} depicted in Fig.10a. 

Any distribution F (from those we are considering) may be identified with a point in 
the triangle A. The origin corresponds to the probability vector (0, 1,0), the “North-East” 
border to vectors (p1,0, p3), etc. If F is the distribution of income, the “best” point is 
certainly the point (0,1). 

Consider two points in A, say, p! and p°, corresponding to some distributions F; and 
F>. Then the mixture &F; + (1 — &)F will correspond to the point p® = ap! + (1 — a)p* 
lying in the segment connecting p! and p?, and such that the distances from this point to 
points p! and p? are in the proportion o/(1—«). In particular, p(!/”) lies in the middle of 
this segment. See also Fig.10a. 

Certainly, this scheme may be generalized to the case of n values x1, ...,x, such that x; < 
... < Xn, whose probabilities are p1,..., pn, respectively. Since pı +...+p, = 1, it suffices 
to consider n — 1 probabilities, say, p2,...,Pn, and the counterpart of A is the set A”) = 
{(p2,--;Pn) : pi 2 0, p2 +... + Pn < 1}. For n = 4, it is a three-dimensional tetrahedral; 
see Fig.10b. For n > 4 it is a multidimensional prism. Mixtures of two distributions lie in 
a line in A”), 


For brevity, we call this illustrative scheme the A-scheme; we will repeatedly return to it. 

In the general framework, a mixture admits the same interpretation. If we view distri- 
butions F as points in a space F, then for fixed points F, and Fz, the collection of points 
F( = aF; +(1—)F for a € [0,1] may be viewed as the segment connecting F, and F». 
When Q runs from 0 to 1, the distribution Fo, varies from F; to F}. 


3. Expected Utility 111 


(a) FIGURE 10. (0) 


The main property of the EUM criterion: 


U(aF, + (1—a)F)) = aU (F,) + (1—@)U(F)). (3.5.10) 


That is, the utility of a mixture is equal to the mixture of the utilities, or, in other terms, U is 
a linear functional. (More precisely, functionals for which (3.5.10) holds only for a € (0, 1] 
are called affine; we keep the more explicit term ‘linear’ .) 


To prove (3.5.10), it suffices to replace F in (3.5.6) by the mixture (3.5.9) and to write 


u (F™) = f uaF ® E i. ` u(x)d (a (x) + (1-0) F(x) 


= af u(x)dFi(x) +(1 -a) f u(x)dFy(x) = aU (Fi) + (1 — a)U (F>). 


—oo 


EXAMPLE 2. Consider again the A-scheme for n = 3. Any rule of comparison of distri- 
butions in this particular case amounts to a rule of comparison of points in A. Identifying 
distributions F with vectors (p1, p2, p3), we write the expectation U (F) as 


U(F) = u(x1)pi +u(x2) po +u(x3)p3 = u(x) pi +u(x2)(1 — pi — p3) + (x3) p3 
= [u(x1) —u(x2)|p1 + [u(x3) — u(x2)]p3 + (x2) = api + bps +h, 


where a = u(x,) — u(x2), b = u(x3) —u(x2), h = u(x2). 
Thus, U (F) is a linear function of pı and p3. 


Let us fix an order = on F. We call a set F CF of distributions an equivalence set, or 
equivalence class, if F ~ G for any F,G € F. In other words, any equivalence set contains 
only distributions equivalent to each other. 

We see from (3.5.10) that under an EUM criterion, such a set is linear in the sense that 
if F,G € F, then any mixture aF + (1 — @)G also belongs to F. Indeed, let U (F) = U (G). 
Then U (aF + (1—a)G) = aU(F) +(1—@)U(G) = QU (F)+ (1-—a)U(F) =U (F). 


EXAMPLE 3. In the A-scheme, n = 3, for points (p1, p3) from A to be equivalent, the 
corresponding values of U(F) = ap; +bp3 +h should be equal to the same constant. In 


112 1. COMPARISON OF RANDOM VARIABLES 


0 1/4 1/3 1 


(b) 


FIGURE 11. 


other words, an equivalence set is a set of points 
ap, +bp3 +h=d, (3.5.11) 


where d is aconstant. This is a line, or more precisely, the part of the line (3.5.11) lying in 
A. This part is, certainly, a segment; see Fig.11a. All equivalence lines are parallel with the 
slope 
pS oO ua) a) (3.5.12) 
b u(x3) —u(x2) 

EXAMPLE 4. Let u(x) be strictly increasing. Consider in A two parallel lines with the 
slope k from (3.5.12). Points in each line are equivalent, points lying in different lines are 
not. Which line is better? 

Answer: The higher. Intuitively it is clear that the closer to the best point (0,1), the 
better. For a rigorous proof, consider a point p?=(0, p$) in the vertical axis and denote 
by F° the corresponding distribution (see Fig.11a). Clearly, U(F°) = bp} +h = [u(x3) — 
u(x2)]p§ +u(x2). The function u is increasing, and hence u(x3) > u(x2) because x3 > x}. 
Then U (F°?) is increasing in Pes and the higher the point p° is, the better it is. On the other 
hand, the higher p®, the higher the line starting from this point with the slope k. 


EXAMPLE 5. Ann is an EU maximizer. You ask her to compare two random variables 
both taking the same three values x;,x2,x3; the first r.v._with probabilities G, 5, 1), re- 
spectively, the second—with probabilities G, és 5): It has turned out that Ann found these 
distributions equivalent for her. Is this information enough to completely determine Ann’s 
preferences among ALL random variables taking the same values? 

Answer: Yes. Since the points mentioned are equivalent, the line going through these 
points and having the slope (5 — {)/(4 — 4) = 3 is an equivalence line. Hence, k = 3. 
Consider two other points, pt and p?, in A, and draw the line / with the same slope k going 
through p!; see Fig.11b. If the point p? turns out to be below J, then p? is worse than p!; if 


p? is above Z, then p? is better. 


Certainly, the particular probabilities in Example 5 do not matter: in the A-scheme for n = 
3, in order to completely determine the preference order, it suffices to find two equivalent 
distributions. 


3. Expected Utility 113 


EXAMPLE 6. Assume that in the previous example, the values x;,x2,x3 are equally 
spaced: x2 — xı = x3 — X2. For instance, xı = 0, x2 = 20, x3 = 40. Then, for a concave 
u(x), we have k > 1. Indeed, in our case, x2 = (x; +.x3)/2, and by property (3.4.6), 
u(x2) > [u(x1) + u(x3)]/2. This implies that u(x2)—u(x1) > u(x3)—u(x2). Hence, in the 
risk aversion case, the slope of equivalence lines corresponds to an angle at least 45°. 


The reader certainly sees that the above reasoning may be easily generalized to the A- 
scheme for n > 3. In this case, equivalence sets are planes for n = 4, and hyper-planes in 
R"~! for n > 4. We come to a general conclusion: 


To completely determine the preference order among all distributions 
concentrated at n fixed points, it is enough to know n — 1 equivalent distributions. 


In Exercise 44, we justify it rigorously. Risk aversion is again connected with the position 
of equivalence hyper-planes; we skip details. 


3.5.6 An axiomatic approach 


D. Bernoulli did not derive the EUM criterion from some original assumptions: he sim- 
ply suggested it and gave an argument for why this criterion seems natural. The modern 
approach to such problems is more sophisticated. We do not point out a good solution from 
the very beginning, but first establish desirable properties of the solution, called axioms. 
After that, we try to figure out which solutions satisfy the properties established. 

In Utility Theory such an approach was first applied in 1944, more than 200 years after 
D. Bernoulli’s paper [13], by J. von Neumann and O. Morgenstern in [96]. 


The basic axiom. Consider a preference order = ona set F = {F } of distributions F. 


Axiom 6 (the Independence Axiom). Let F, G, and H belong to F, and F = G. Then for 
any & € [0,1], 
aF +(1-a)H = aG+(1—a)H. 


The axiom sounds quite plausible: if you mix F and G 
with the same distribution H, then the relation between the G 
mixtures will be the same as for the original distributions F aG+(1-0)H 
and G. (We will see, however, in Section 4 that this is far 
from being always true.) 

A geometric illustration is given in Fig.12. Let a “point” 
F be “better” than G. Consider the “segments” connecting FIGURE 12. 
F and H, and G and H. Let us choose two points, one in each segment, in a way that their 
positions between F and H, and G and H, respectively, would be in the same “proportion”. 
Then these points are in the same relation as the original points F and G. 


oF +(1-0.)H 


EXAMPLE 1. John, when comparing two random variables, 


y= $100 with probability 0.2, Ce $50 with probability 0.4, 
‘| $0 with probability 0.8, 2 {$0 with probability 0.6, 


114 1. COMPARISON OF RANDOM VARIABLES 


has decided that for him X? > X1. (John seems to be a strong risk averter.) After that John 
is offered to play one of two games. In both games, a coin will be tossed, and in the case 
of a head, John will get $100. In the case of a tail, in the first game, John will get a random 
prize amounting to the r.v. X;, while in the second game—to the r.v. X2. Thus, eventually, 
the prizes for the games are 


Y — $100 with probability 0.5 + 0.5 -0.2 = 0.6, 
i $0 with probability 0.4, 
$100 with probability 0.5, 
Y> = 4 $50 with probability 0.5-0.4 = 0.2, 
$0 with probability 0.3, 


respectively. If John, consciously or not, follows the Independence Axiom, he would prefer 
Y to Yj. 


Proposition 7 The EUM criterion satisfies the Independence Axiom. 


Proof is practically immediate. In the EUM case, if F = G, then U (F) > U(G). Hence, 
by (3.5.10), U(aF + (1—a)H) = aU (F) + (1 — &œ)U (H) > aU (G) + (1 — a)U (H) = 
U(aG+(1-—a)H). Consequently, aF + (1 — &)H = aG+(1—a)H. 


The main result of classical utility theory is that under some additional, more technical, 
conditions, the converse claim is also true: 


The Independence Axiom implies the EUM principle. j 


We skip here details and a rigorous formulation; for a detailed exposition see, e.g., [82]. 


Route2 = page 115 


Linearity and continuity lead to EUM. Consider a somewhat weaker proposition illus- 
trating, to a certain extent, the situation. 

We saw that the utility functional (3.5.6) was linear, that is, U (œF; + (1 — a)F2) = 
aU (Fi) + (1 —a)U(F2). The question we discuss here is whether any linear utility func- 
tional admits the representation (3.5.6). (That is, instead of the Independence Axiom, we 
consider the linearity property itself.) The answer is “yes, if U (F) is in a certain sense 
continuous”. 

To specify what this means, first note that convergence of distributions may be defined 
in different ways. We consider two. 


1. Weak convergence. We say that a sequence F, converges to a distribution F weakly, 
and write it as Fp, > F, if F,(x) — F(x) for any x at which F(x) is continuous. 


4. Non-Linear Criteria 115 
2. Convergence for all sets. We say that a sequence F, converges to a distribution F for 
all sets, and write it as F, 9 F, if F,(B) > F (B) for any B. 


For more detail on convergence of r.v.’s and distributions, see Section 0.5. 


Clearly, convergence in the later case implies convergence in the former. 

Accordingly, we consider two definitions of continuity of a functional U (F). 

Condition C1. (Weak continuity): U(F,) — U (F) provided F, $ F . 

Condition C2. (Continuity with respect to convergence for all sets): U(F,) + U(F) 
provided F, $ F. 


Condition C1 is stronger than Condition C2 since convergence U (Fp) + U (F) in the 
former case takes place under a weaker (!) requirement on the convergence of F}. 


Theorem 8 Suppose U(F) is defined on the set of all distributions and is linear, i.e., 
(3.5.10) is true. Then, if Condition C2 holds, there exists a bounded function u(x) such 
that 


U(F) = iL “ u(x)dF (a). (3.5.13) 


If Condition C1 holds, the function u(x) in (3.5.13) is continuous. 


(A proof may be found in [40], [120].) 


4 NON-LINEAR CRITERIA 


The EUM approach may be considered as a first approximation to the description of 
people’s preferences. Over the years, there has been a great deal of discussion about the 
adequacy of this approach. Many experiments have been provided and a number of exam- 
ples have been constructed, showing that the EUM approach is far from being efficient in 
all situations. The existence of such examples is not surprising; on the contrary, it would 
have been surprising if the behavior of such sophisticated (and sometimes strange) crea- 
tures as human beings had been always well described by simple linear functions. This 
section concerns some elements of modern utility theory. 


4.1 Allais’ paradox 


The following example considered by M. Allais [2] is probably the most famous. Though 
being contrived, it is very illustrative. Consider distributions F,, Fo, F3, F4 of a random 
income with values $0, $10 millions, or $30 millions. The corresponding probabilities are 
given in the following table. 


116 1. COMPARISON OF RANDOM VARIABLES 


$0 | $10 million | $30 million 
F l0 1 0 
F» | 0.01 0.89 0.1 
F | 0.9 0 0.1 
F4 | 0.89 0.11 0 


Apparently, a majority of people would prefer F; to F2, reasoning as follows. Ten million 
dollars is a lot of money, for ordinary people as inconceivable as thirty million. So, it is 
better to get ten for sure than to go in pursuit of thirty million at the risk of receiving nothing 
(even if the probability of this is very small). Thus, F; > Fo. 

The situation with F} and F; is different. Now the probabilities of receiving nothing are 
large—and hence one should be ready to lose, and these probabilities are practically the 
same. Then it is reasonable to choose the variant with the larger prize. So, F3 > F4. 

Let us consider the mixtures iF 1+ 5P; and iF + F4. If the preference > had been 
preserved by a utility functional (3.5.5), in the light of the linearity property (3.5.10), we 


1 1 1 1 
Id have had -F + —F Po + =F. 
wou ave had Pit r aE , i : ; 
However, as a matter of fact, as is easy to calculate, a + af = aie z+ 


Next, we address several directions in which the EUM criterion can be generalized. (In 
particular, the situation in Allais’s paradox may be described by using the schemes below; 
we will skip concrete calculations.) 

Note also that examples in this section do not aim to justify criteria we are introducing; 
justification comes from empirical evidence and qualitative reasoning based on axioms 
we will discuss. The goal of these examples is more modest—to demonstrate how criteria 
introduced will work if we accept them, and what new features they can raise in comparison 
with the classical EUM criterion. 


4.2 Weighted utility 
The criterion below is based on two functions: u(x) which we view as a utility func- 
tion, and a non-negative function w(x) which we call a weighting function. Consider the 
functional 
SZ u(x) w(x) dF (x) 
Sw(x)dF (x) 


W(F)= (4.2.1) 
assuming that the denominator in (4.2.1) is not zero. 
Note that if F is the distribution of a r.v. X, then (4.2.1) may be rewritten as 


_ E{u(X)w(X)} 
E{w(X)} 
The difference between the classical expected utility scheme and the last case is that now 


we assign to different values x weights w(x). If all weights w(x) = 1, the denominator in 
(4.2.1) equals one, and we deal with the EUM case. 


W(F) 


4. Non-Linear Criteria 117 


Since we compare here rather distributions than r.v.’s themselves, it is convenient to 
define a preference order = on the set F = {F} of distributions F for which W (F) is 
well defined. Let an order = be preserved by the functional W: 


F Z G4 W(F)>W(G). 


When comparing r.v.’s, we will say that X = Y if Fy = Gy, where Fy and Gy are the 
distributions of X and Y, respectively. 

Following tradition, we denote by 6, the distribution of a non-random X = c. Then the 
certainty equivalent c(F) is defined as a number c such that 6, ~ F. 

In the particular case of this section, W(8,) = u(c)w(c)/w(c) = u(c). Thus, for the 
certainty equivalent c = c(F) of a r.v. X with a distribution F, we have u(c) = W (F), and 


c(F) =u"! (W(F)). (4.2.2) 


EXAMPLE 1. Let u(x) = x“, and w(x) = x?. We assume a > 0. As to the parameter 
B, depending on the situation, it may be either positive (the larger a value, the larger its 
weight), or negative (the larger a value, the less its weight), or zero. In the last case, B = 0, 
we deal with the EUM criterion. 

For a positive r.v. X with a distribution F, by definition (4.2.1), 


7 E{xo+By 


W(F) = ERY (4.2.3) 


In particular, for « = B = 1, we have 


_ E{X?} _ _E{X°} 


WE) = Foy EY 


E{X}. 


It is well known that (E{X})* < E{X7} (see, for example, Exercise 33b, or recall that 
Var{X} = E{X*} — (E{X})? > 0). Hence, W (F) > E{X}. Because in the last case u(x) = 
x and, consequently, u~! (x) = x, the certainty equivalent c(F) = W (F) > E{X}, so we deal 
with a risky person. It is not surprising since large values have large weights. 


EXAMPLE 2. Let X in the previous example be uniformly distributed on [0,d]. The 
reader is invited to verify that in this case, for B > —1, 


W(F) = Ee 


Following (4.2.2), for the certainty equivalent we have 


1+8 ya 


c(F) = (W(F))"* = | —— 4.2.4 
m= wiry = (6 424) 
If B =0, we come to the result of Example 3.1.3-2, which is not surprising since in this case 
w(x) = 1, and we deal with expected utility. The larger B, the greater c(F’), and this is also 
understandable: the larger B, the larger weights for large x’s, so the certainty equivalent 


must grow as ß increases. 


118 1. COMPARISON OF RANDOM VARIABLES 


Next, we generalize the model from Section 3.2. Consider the maximal premium a client 
with a wealth of w is willing to pay to insure a risk €. As in Section 3.2, we write that 


w— Gmax ~w—. (4.2.5) 


Since the r.-h.s. of (4.2.5) is certain, (4.2.5) is equivalent to w — Gmax = c(Fy—e), where as 
usual, the symbol Fy stands for the distribution of a r.v. X. So, 


Gmax =w—c(w—&) =w—u!(W(F,_¢)). 


EXAMPLE 3. As in Examples 1-2, let u(x) = x% and w(x) = x°. Let & be uniformly 
distributed on [0,d], and w = d. Then the r.v. w—6& is also uniformly distributed on [0,d], 
and it follows from (4.2.4) that 


EENE E ET, (1- CD 


The classical EUM case corresponds to B = 0. We see that for B > 0, the maximum pre- 
mium becomes smaller. 


Next, we discuss the linearity issue (see Section 3.5.5). If we replace F in (4.2.1) by a 
mixture F(®) = a, + (1 — @)F (compare with Section 3.5.5), we get 


a fZ u(x)w(x)dFi (x) + (1-a) [Z u(x) w(x) dF (x) 
a fZ w(x)dFy (x) + (1 — @) [Z w(x)d Fo (x) f 


In general, this quantity is certainly not equal to aW (F3) + (1 — &)W (F2) (see also Exercise 
57). Hence, in this case, the linearity property and the Independence Axiom do not hold. 


W(FO) = (4.2.6) 


EXAMPLE 4. Consider the A-scheme described in Section 3.5.5 with r.v.’s taking only 
three fixed values x1,x2,x3. Let xı < x2 < x3. Any distribution F of such a r.v. is identified 
with the probability vector (p1, p2, p3), where p; is the probability of the value x;. Consider 


points in A = {(p1,p3) : p1,p3 > 9, pit p3 < 1}. Setting p2 = 1 — pı — p3 we have 
u(x1 )w(x1) pi +u(x2)w(x2)p2 +u(x3)w(x3)p3 


W(F)= 
(r) w(x1) pi +w(x2)p2 +w(x3)p3 
_apıtbpz+h (4.2.7) 
Gpi+bp3+h 


a = u(x1)w(x1) — u(x2)w(x2), b= u(x3)w(x3) —u(x2)w(x2), h = u(x2)w(x2), 
a = w(x1)—w(x2), b = w(x3) — w(x2), h = w(x2). 


Since we consider only distributions for which J w(x)dF (x) > 0, we consider only 


those points in A for which the denominator in (4.2.7) is positive. All points for which 
W (F) equals a constant d are points for which ap; + bp3 +h = d(ãpı + bps +h), o 


(a—dãpı + (b—db)p3 +h- dh =0. (4.2.8) 


4. Non-Linear Criteria 119 


This is a line, or more precisely the part of the line (4.2.8) lying in A. However, the slope 
of this line depends on d, so lines corresponding to different d’s are not parallel (!). 

Thus, although the Independence Axiom and the corresponding linearity property are not 
true in this case, we have a sort of the linearity property when we consider equivalent distri- 
butions. 

(An equivalence segment] It is interesting that all lines (4.2.8) intersect at the 
same point. This point is the intersection of two lines: 
lọ defined by the equation ap; + bp3 +h = 0 (points 
where W (F) =0), and l» defined by ãpı +bp3+h =0 
(points where the denominator in (4.2.7) is zero, and 
hence W(F) is not defined). Advice on how to show 
it rigorously is given in Exercise 56, a typical picture 
is depicted in Fig. 13. 


FIGURE 13. EXAMPLE 5. In the A-scheme, let xı = 0, x2 = 
1,x3=2; u(x) = Vx, and w(x)= 1/(1 +x). Then, as is easy to verify, a = — $, b = 
{2—1 ~ —0.029, h = }, = },b = —Ł and h=1}. Thus, up to the third digit, the point of 


the intersection of all lines (4.2.8) is the intersection of the lines —0.5p; —0.029p3+0.5 =0 
and 0.5p; — 0.166p3 + 0.5 = 0. The approximate solution is the point (0.704, 5.111). 


The next scheme generalizes the approach of this section. 


4.3 Implicit or comparative utility 
4.3.1 Definitions and examples 


In this section, we consider not a utility function u(x) but a function v(x,y) which we 
will call an implicit utility or comparative utility function, and interpret it as a function 
indicating to what extent income x is preferable to income y. One may say that v(x, y) is the 
comparative utility of x with respect to y. In light of this, we assume v(x,x) = 0, v(x,y) > 0 
if x > y, and v(x,y) < Oif x < y. Sometimes one can assume v(x, y) = —v(y,x) but in general 
it may be false: x may be “better” than y to a smaller extent than y is “worse” than x; see 
also Example 3 below. 

It is natural to assume that v(x, y) is non-decreasing in x and is non-increasing in y, which 
again reflects the property “the larger, the better”. 


EXAMPLE 1. Let 
x—y 
1+] + [yl 
In this case, for small x and y the comparative utility almost equals the difference x — y, 
while for large x’s and y’s the measure v(x, y) may be viewed as a relative difference: x— y 
is divided by 1 + |x| + |yl. 


v(x,y) = (4.3.1) 


We define the certainty equivalent of a r.v. X as a solution to the equation 
E{v(X,c)} =0, (4.3.2) 


provided that this solution exists and is unique. The interpretation is clear: c(X) is the 
certain amount whose comparative utility with respect to X equals zero on the average. 


120 1. COMPARISON OF RANDOM VARIABLES 


EXAMPLE 2. Let v(x, y) be given by (4.3.1), and let X = d > 0 or 0 with equal probabil- 
ities. Then (4.3.2) is equivalent to 5v(d,c) + 5v(0,c) = 0. Obviously, c should be between 
d and 0. So, c > 0 and the equation may be written as 


1 d—c 1 O-c 


2 l+d+c 2 1+0+4c 


This is a quadratic equation. Its positive solution is 


vy1+2d-1 
7 ; 


As is easy to verify, c above is less than E{X } = d /2, so we have a sort of risk aversion. For 
large d, we have c ~ \/d/2, which is much smaller than d/2. In Exercise 58, we compute 
the maximal accepted premium. 


Once we have defined what is certainty equivalent in this case, we can define the corre- 
sponding preference order by the rule 


XZY 4> c(X) > c(Y). 


First of all, note that this scheme includes the classical EU maximization as a particular 
case. Indeed, let v(x,y) = u(x) — u(y), where u is a utility function. Assume that u is 
increasing, so its inverse u7! exists. In this case, (4.3.2) implies E{u(X)}— E{u(c)} = 0, 
and since c is certain, we have E{u(X)} = u(c). Hence, c(X) = u`! (E{u(X)}), as in the 
classical case. Because u`! (x) is increasing, the relation c(X) > c(Y) is equivalent to the 
relation E{u(X)} > E{u(Y)}. 

Furthermore, the weighted utility scheme is also a particular case of the comparative 
utility. To show it, set 

v(x,y) = wx) (u(x) —u(y)]. 
In this case, for the certainty equivalent c of a r.v. X with a distribution F we write 


oo 


0 =F{v(X,c)} = / v(x,0)dF (a) = / _ w(a)[u(x) —u(e)]aF (x) 


oo 


= [_ua)w(dP (x) -ule) f w(x)dF (x). 


—oo 


From this it follows that u(c) = W (F), where W(F) is the same as in (4.2.1). If u(x) is 
strictly increasing, then c = u~! (W (F)). Since u~! is also strictly increasing, c(F) > c(G) 
iff W(F) > W(G). 


Let now v(x,y) be concave with respect to x. Consider a r.v. X and set m = E{X} and 
c = c(X). By definition, v(m,m) = 0. Then, by Jensen’s inequality, 


v(m,m) =0 = E{v(X,c)} < v(E{X},c) =v(m,c). 
Thus, v(m,m) < v(m,c). Since v(x, y) is non-increasing in y, this implies that 


c(X) < E{Xx}. 


4. Non-Linear Criteria 121 


A good example for such a function v is 


v(x,y) =8(x—y), 


where g(s) is a concave increasing function such that g(0) = 0. Note that in this case we 
should not expect v(x,y) = —v(x,y). 


EXAMPLE 3 (a jealous person). Let Mr. J.’s implicit 
utility function v(x,y) = g(x—y), where 


tS ifs>0, 
g(s)=4 I+s 
sS ifs <0. 


The function g(s) is concave, its graph is given in Fig. 14. 


FIGURE 14. 


Mr. J. may be characterized as pretty jealous. Assume 
that x is Mr. J.’s wealth, and he compares it with a Mr. A.’s wealth y. If x is much larger than 
y, the comparative utility v(x,y) is, nevertheless, not large, and v(x,y) > 1 as x— y > ©, 
(Mr. J. does not think that his wealth is much more valuable than that of Mr. A.) 

On the other hand, if x is much smaller than y, the comparative utility is negative with 
a large absolute value, and v(x,y) — —oo, as x — y + —oo. (Now Mr. J. considers himself 
much less happy than Mr. A.) 

Let a r.v. X = d > 0 or 0 with equal probabilities. In this case, equation (4.3.2) is reduced 
to 5g(d—c) +4g(—c) =0, which leads to 


aC. — 
er era 


This is a quadratic equation. The solution that lies between d and 0 is 


c=c(X)= : : 
+(d/2)+./1+ (d?/4) 


The denominator is greater than two. So, c < (d/2) = E{X}. 


4.3.2 In what sense the implicit utility criterion is linear 


AS we saw, the equation (4.3.2) may be written as 


f earo, (4.3.3) 


where F is the d.f. of X. Consequently, the solution to this equation depends not on the r.v. 
X itself but on its d.f. F. That is, c(X) is a function (or functional) of F, and it should not 
cause confusion if we use also the notation c(F’) defining it as a solution to (4.3.3). 

Consider a set ¥ = {F} of distributions. Assume that the function v(x,y) is such that 
c(F) exists and is unique for each F € F. Let us define a preference order = in F by 


F Z G <=> c(F)>c(G). 


122 1. COMPARISON OF RANDOM VARIABLES 


Proposition 9 Let F ~G (F is equivalent to G.) Then for any & € [0,1] the mixture 
F® -aF+(1-Q)G~F. 


Proof is short. If F ~ G, then F and G have the same certainty equivalent. Denote it by 
c. By definition (4.3.3), 


[veer (x) =0, and J veoactx) — 0. 


Then 


oo 


f v(x,c)dF (x) =a f v(x,c)dF E E / Hence =0. 
Hence, c is the certainty equivalent for F‘ too. W 
Geometrically, it means that equivalent points still lie in the same line, but equivalency 
lines may not be parallel. 


EXAMPLE 1. Consider again the A-scheme for n = 3. In this case, equation (4.3.3) may 
be written as 


v(x1,¢)p1 +V(x2,c) p2 + v(x3,c)p3 = 0. 


Since p2 = 1 — pı — p3, we can rewrite it as 
a(c)pi + b(c)p3 + A(c) =0, (4.3.4) 


where a(c) = v(x1,c) —v(x2,c), b(c) = v(x3,c) — v(x2,c), h(c) = v(x2,c). 

Let us compare this with what we did in Section 
3.5.5, where a,b, and h did not depend on c. For a 
fixed c, all points that satisfy (4.3.4) lie in a line, and 
the slope of this line depends on c. Unlike the case of 
weighted utility, the dependence of the slope on c is 
rather arbitrary, so we should not expect that all these 
lines intersect at the same point. A typical picture is 
given in Fig.15. 


{An equivalence segment} 


FIGURE 15. 


As we saw, and as we see now again, the Indepen- 
dence Axiom (IA) does not hold in this case. A general theory which we do not present 
here shows that IA may be replaced by the following weaker 


Axiom 10 (The Betweenness Axiom.) Let F,G € F , and F ~ G. Then for any & € [0,1], 
OF + (1-a)G~G. 


We see that, unlike the IA, the Betweenness Axiom (BA) deals not with the case when 
F = G but only with equivalent F and G. The fact that BA is weaker than IA is presented 
in 


Proposition 11 Jf JA holds, then BA holds too. 


4. Non-Linear Criteria 123 


Proof. Set H = G in the formulation of the Independence Axiom 6. Let the IA hold, and 
let F = G. Then oF + (1-—a)G = aG+ (1 —a)G =G, that is, oF + (1—a)G = G. Now 
let F ~ G. Then F = G, and G X F simultaneously. Consequently, aF + (1—a)G = G 
and oF + (1 — &)G XG, simultaneously. That is, oF +(1—a)G~G. m 


It proves that together with some more technical assumptions, 


The Betweenness Axiom implies the implicit (or comparative) utility principle. 


4.4 Rank Dependent Expected Utility 
4.4.1 Definitions and examples 


The next approach essentially differs from those of previous sections. For simplicity, we 
restrict ourselves to non-negative r.v.’s. 

Consider probability distributions F on [0,°¢) and two functions: u(x) viewed again as 
a utility function, and a function ¥(p) defined on [0,1], which we call a transformation 
function. We assume ¥(p) to be non-decreasing, ¥(0) = 0, (1) = 1. We consider an 
individual (or investor) whose preferences are preserved by the function (or functional) 


R(F) = f * u(x)d¥ (F(x). (4.4.1) 


The transformation (or weighting) function reflects the attitude of the individual to 
different probabilities. The individual, when perceiving information about the distribution 
F, “transforms” the actual distribution function F (x) into another one, Fy(x) = Y(F(x)), 
underestimating or overestimating real probabilities. 

First, let us show that Fy(x) is indeed a distribution function.” First, since we consider 
non-negative r.v.’s, we have F(x) = 0 for x < 0. Second, due to properties of ¥, the function 
Fy(x) =‘Y(F(x)) is non-decreasing. Moreover, for x < 0 we have Fy (x)= (F (x))=(0)=0, 
and Fy (cc) =P (F (cc))=¥(1) = 1. Hence, Fy(x) is a d.f. 

Note also that in (4.4.1), we transform a distribution function, not a density. The latter 
transformation would lead to non-desired consequences. For example, a transformation 
Y(f(x)) of a density f may be not a density, since fọ ¥(f(x))dx might not equal one; so 
we would no longer deal with a probability distribution. 

The quantity (4.4.1) is referred to as a Rank Dependent Expected Utility (RDEU). An- 
other term in use is distorted expectation; see, e.g., [33]. 

The corresponding preference order = is preserved by the function R(F), that is, 


F=G<>R(F)>R(G). 


A simple example is ¥(p) = p’. If B = 1, the subject perceives F as is and deals with 
the “usual” expected utility. If B < 1, then p° > p, and the investor overestimates the 
ZWe skip the question of continuity from the right. Since we required ¥(0) = 0, we cannot require ¥(p) to be 
right continuous at p = 0, and hence ¥(F (x)) may be not right continuous everywhere. However, it does not 
influence the integral in (4.4.1), so it is not an essential circumstance. 


124 1. COMPARISON OF RANDOM VARIABLES 


probability for the income to be less that a fixed value: the investor is “security-minded”. 
In the case B > 1, the investor underestimates the probability mentioned, being “potential- 
minded” (or “opportunity-minded’’) . 


EXAMPLE 1. Let X be uniformly distributed on [0, 1]. Its distribution function 


0 ifx<0O, 0 ifx <0, 
F(x)=¢ x ifxe [0,1], and Fy(x)=<¢ x? ifxe [0,1], 
1 ifx>1, 1 ifx>l. 


Then the density of the transformed distribution, 


0 ifx <0, 
falx) = F(x) = BxB-! if x€ [0,1], 
0 ifx>1. 


For example, if B > 1, the density f(x) is increasing, and while for the original distribution 
all values of X are equally likely, in the “investor’s mind” it is not so: smaller values are 
less likely. 

To the contrary, if B < 1, the density f(x) — œ (!) as x — 0, that is, the investor strongly 
overestimates the probability to get nothing. 

The case B = 0 corresponds to an “absolutely pessimistic” investor: Fy(x) = 0 for x < 0, 
and = 1 for x > 0, that is, Fy is the distribution of a r.v. X = 0. In this case, the investor 
expects that she/he will get nothing for sure. 


EXAMPLE 2. Assume that an investor does not distinguish small values of the income. 
For instance, hoping for an income equal to $100,000 on the average, the investor considers 
income values of $1 or even $100 as too small and, consciously or not, identifies them with 
zero income. 

Denote by F the distribution of the investor’s income and assume that the investor iden- 
tifies with zero all values which are less then the y-quantile g,(F’) for some small fixed y. 

Suppose the same is true for “inconceivable” large values. For instance, the same in- 
vestor may consider $1 million or $10 million an improbable luck, and (consciously or not) 
identify these numbers. More precisely, choosing for simplicity the same level y, assume 
that the investor identifies with qı—y(F) all values which are larger than g1_y(F). 

In both cases, we may talk about the existence of a perception threshold. Such a situation 
may be described by the truncation transforming function ¥ such that 


yifO<p<y, 
(0) =0, and W(p)=4 pif y<p<1-Y, 
lif p>1-y. 


In this case, 
y if0<x<q@(F), 
Fy(x) = 4 F(x) if a(F) <x <qi-y(F), 
1 if x>q_-y(F). 
Hence, 


q-y(F) 
R(F) = u(0) y+ f u(x)dF (x) +u(qi-y(F)) Y. (4.4.2) 


4. Non-Linear Criteria 125 


The functional (4.4.2) is not linear and should be distinguished from the naive criterion 
when truncation is carried out at a fixed, perhaps large, value not depending on F. 


EXAMPLE 3. Let F be the distribution of a r.v. taking only two values, say, a and b > a 
with respective probabilities p and 1 — p. Then F (x) = 0 if x € [0,a), F(x) = p if x € [a,b), 
and F(x) = 1 if x € [b,œ). Consequently, ¥(F(x)) = 0 if x € [0,a), #(F(x)) = ¥(p) if 
x € [a,b), and ¥(F(x)) = 1 if x € [b,%). Then, by (4.4.1), 


R(F) = u(a)¥(p) +u(b)[1 — 'Y(p)]. (4.4.3) 


In this case, ¥(-) “transforms” just one probability p. 

Note also that if a r.v. X = c, then its d.f. (x) = 1 or 0 for x > c, and x < c, respectively. 
Hence, ‘¥(5.(x)) also equals 1 or 0 for x > c, and x < c, respectively, and by (4.4.1) or 
(4.4.3), 

R(õ-) = u(c). (4.4.4) 


EXAMPLE 4. Let a person having, for instance, the utility function u(x) = \/x, choose 
between one of two retirement plans: either the annual pension is equal to X = $100,000, 
or it is equal to r.v. Y = $50,000 or $200,000 with probabilities 1/2. 

For the numbers above, the expected utility criterion leads to a slight preference for the 
latter plan (E {u(X)} ~ 316 and E {u(Y)} ~ 335), which does not looks very realistic. One 
would expect most people to choose the plan X. 

On the other hand, by (4.4.4), we have R(Fx) = u(10°) and, by (4.4.3), R(Fy) = u(5- 
104) (1/2) +.u(2-10°)[1 —¥(1/2)]. 

It is easy to calculate that R(Fy) > R(Fy), that is, the individual would prefer X, if 
(1/2) > 0.59. This means that such a person slightly overestimates the probability of the 
unlucky event to get $50,000 (since this probability equals 1/2). So, one can expect ¥(p) 
to be concave for large p’s. Certainly, this naive example is given merely for illustration. 


For the certainty equivalent c = c(F) of a distribution F, we have u(c) = R(F), and 


c(F) =u |(R(F)). (4.4.5) 


EXAMPLE 5. Let F be the uniform distribution on [0,b], u(x) =x, and ¥(p) = p°. 
Then, by (4.4.5), 


an- (fenh) (a) (ph) 


For B = 1, we have c(F) = [1/(1 +a)]!/%b, which corresponds to the EUM case considered 
in Example 3.1.3-2. For B > 1, the certainty equivalent gets larger; for B < 1 — smaller. It 
makes sense: for B > 1, the individual underestimates the probabilities of “bad events”, 
so the certainty equivalent is larger in comparison with the case when the probabilities 
mentioned are perceived correctly. The case B < 1 is the opposite. 


126 1. COMPARISON OF RANDOM VARIABLES 


4.4.2 Application to insurance 


Consider the insurance model from Section 3.2, keeping the same notation for the wealth 
w, the random loss &, and the premium G. Following the same logic, we see that for G to 
be acceptable, the certain quantity w — G should be preferable to the r.v. w — &. 

For the maximal accepted premium Gmax, the r.v. w — Gmax Should be equivalent to w — &. 
In the RDEU framework, this means that 


u(w — Gmax) = R(Fy_e), (4.4.6) 


where, as usual, Fy denotes the distribution of a r.v. X. 


EXAMPLE 1. As in Example 3.2-1, let u(x) = 2x—x?, w = 1, and the r.v. € be uniformly 
distributed on [0, 1]. Let ¥(p) = p®. Since 1 — & is also uniformly distributed on [0, 1], 


1 
R(F\_z) = f (2x-x2)dx = Cp, 


B(3 +B) 
(1+ B)(2+B) 


we obtain from this that 
Gmax = 4/ 1 — Cg = 2 
max yoo R N FBC) 


If B = 0 (the absolutely pessimistic investor from Example 4.4.1-1), Gmax is equal to one, 
that is, to the maximal possible loss. (The investor feels that the maximal loss will happen.) 
The larger B, the less Gmax, which is natural. For B = 1, we have Gmax = 1/\/3 as in the 
expected utility case in Example 3.2-1. 


where Cg = . Let y = w — Gmax- Then 2y — y? = Cg. As in Example 3.2-1, 


4.4.3 Further discussion and the main axiom 


Next, we consider some possible forms of the transformation function ¥(p). To clarify 
the classification below, note that when saying that a subject underestimates the probability 
of an event, we mean that the subject perceives the likelihood of this event to be less than 
it really is. In the extreme case, the subject neglects the possibility of such an event. The 
four cases we discuss below are illustrated in Fig. 16. 


e (p) > p andis concave. For any certain level of income, the subject overestimates 
the probabilities that the income will not reach this level. The subject is “security 
minded”. 


e (p) < p and is convex: the opposite case. The subject is “potential-minded”. 


e Y(p) is S-shaped. The subject underestimates the probabilities of very large and very 
small values and, consequently, proceeds from moderate values of income. 


e ‘P(p) is inverse-S-shaped: “cautiously hopeful”. Roughly speaking, the subject over- 
estimates the probabilities of “very large” and “very small” values. 


4. Non-Linear Criteria 127 


S-shaped 


0 1 p 0 


FIGURE 16. Transformation functions. 


Many experiments presented in the literature testify to the inverse S-shaped pattern (see, 
e.g., [83]). However, one should realize that these experiments, usually dealing with stu- 
dents, concern one-time gains or investments, and the amounts of money involved are not 
large. In such situations, it is not surprising that people count to some extent on large values 
of the income, overestimating real probabilities of their occurrence. 

In long run investment, when dealing with significant amounts of money and in situa- 
tions when these amounts really matter for the investor (say, in the case of a retirement 
plan), the investor may exhibit a different behavior, proceeding from moderate values of 
the income rather than from the possibilities of large deviations. In such situations, an 
S-shaped transformation may be more adequate. 


An interesting theory on possible forms of the transformation function Y may be found, 
e.g., in [83] and [103]. 
In conclusion, we discuss the main axiom connected with RDEU. 


Consider an investor with a preference order = and two d.f.’s, F(x) and G(x). Assume 
that F (x) = G(x) for all x’s from a set A, and suppose that for the investor, F = G. 
Now, assume that we change F (x) and G(x) in a way such that 


e all changes concern only values of F (x) and G(x) at x’s from A, 


e when changing F (x) and G(x), we keep them equal to each other for x € A. 


128 1. COMPARISON OF RANDOM VARIABLES 


FIGURE 17. 


If for any F and G, after such a change, the investor continues to prefer F to G, we say 
that the investor’s preferences satisfy the ordinal independence axiom. The idea of such 
axioms (if you change the common part, the relation does not change) is referred to as the 
sure-thing-principle. Formally, the above axiom may be stated as follows. 


For any two d.f.’s, F(x) and G(x), such that F(x) = G(x) for all x’s from some set A, 
consider two other d.f.’s, F(x) and G(x), such that F(x) = F(x) and G(x) = G(x) for all 
x ¢ A, and F(x) = G(x) for x € A; see also Fig.17. 


Consider the set F = {F } of all distributions F and a preference order = on F. 


Axiom 12 (The ordinal independence). For any pair F,G from F, if F = G, then for any 
distributions F and G with the mentioned above properties, F = G. 


One can prove that along with some more technical assumptions, 


The ordinal independence axiom implies the RDEU principle. 


To conclude the whole Section 4, it is worth pointing again to one special feature of 
all models we considered: we identified random variables with their distributions. In other 
words, two r.v.’s with the same distributions are viewed in these models as equivalent. Much 
evidence has been accumulated showing that this is not always the case. Sometimes people 
distinguish outcomes with the same probabilities if the ways leading to these outcomes are 
different. The question touched on is connected with the so called coalescing property in 
the modern theory of gambles; see, for example, [83]. This interesting issue, however, is 
beyond the scope of this book. 


5. Optimal Payment from the Standpoint of an Insured 129 


5 OPTIMAL PAYMENT FROM THE STANDPOINT OF AN IN- 
SURED 


5.1 Arrow’s theorem 


We consider here the following problem. 

An individual with a wealth of w is facing arandom loss X with a mean m > 0. To protect 
her/himself against at least a part of the risk, the individual appeals to an insurer. 

The insurer, having many clients, when specifying the corresponding premium g, pro- 
ceeds merely from the mean value of the future payment. 

As a particular example, we may consider the situation when if the mean payment is 
i, the insurer agrees to sell the coverage for the premium g = (1+ 8)A for a fixed 6 > 0. 
The coefficient 9 is called a relative security loading coefficient. For instance, if @ = 0.1, 
the insurer adds 10% to the mean payment. In following chapters, we will consider the 
characteristic 0 repeatedly and in detail. 

However, the case of such a determination of the premium is merely an example. Here, 
for us it is only important that there is a strict correspondence between g and A, and once A 
is given, the premium g is fixed. 

If the coverage is complete, the mean payment is equal to the mean loss, that is, A = m. 

(In the particular case of security loading, the premium would be gcomplete = (1+6)m). 

However, the premium in the case of complete coverage may be too large for the in- 
dividual (or she/he may be just not willing to pay it). In this case, the individual buys a 
non-complete coverage with a mean payment A < m. In this case, the policy is specified by 
a payment function r(x), the amount that the insurer will pay if the loss X assumes a value 
x. Since the coverage is not complete, 0 < r(x) < x. Note right away, that this implies that 
0 < r(0) < 0; that is r(0) =0. 

As we have assumed, the insurer requires only one condition on r(x) to hold: 


E{r(X)} =A. (5.1.1) 


In this case, the premium g is completely specified by A, and the individual can choose any 
r(x) provided that (5.1.1) is true. The question is which r(x) is the best. 

Before considering examples of possible payment functions, note that (5.1.1) may be 
rewritten as follows. 

Assume that r(x) is non-decreasing, which is natural. Let Fo(x) be the d.f. of X. Then, 
by virtue of (0.2.2.1), 


EPO = [1 F(a))ar(. 


Thus, condition (5.1.1) may be rewritten as 
f E are (5.1.2) 
0 


Consider particular examples of the payment function r(x). 
EXAMPLE 1 (Proportional insurance or quota share insurance). In this case, : r(x) = 
kx, k < 1. Then, E{r(X)} = E{kX} = km, and (5.1.1) implies that k = À /m. 


130 1. COMPARISON OF RANDOM VARIABLES 


EXAMPLE 2 (Excess-of-loss or stop-loss insurance). We will call it also insurance with 
a deductible. In this case, 


ifx< 
ra)=nt)= {og ae (5.1.3) 
where the number d is called a deductible. In this case, payment is carried out only if the 
loss exceeds the level d, and if it happens, the insurer pays the overshoot. The term “excess- 
of-loss” is used when such a rule concerns each contract separately, “stop-loss” —when it 
concerns a whole risk portfolio. 

Inserting (5.1.3) into (5.1.2), we have 


fo Rie (5.1.4) 


The last relation is an equation for d given A. Simple particular examples are relegated to 
Exercise 61. 
EXAMPLE 3 (Insurance with a limit coverage). In this case, 


i < 
= k ifx< s, 


s ifx>s, 


where s is the maximum the insurer will pay. Again using (5.1.2), we see that restriction 
(5.1.1) may be written as 


fa ein 
0 


which is an equation for s. 

We return to the optimization problem. Assume that the preferences of the individual are 
preserved by a function U (F). For the reader who skipped Section 4 on non-linear criteria, 
we start with the EUM case where 


U(F) = f “uedar; (5.1.5) 


and u is a non-decreasing utility function. In the end of the section, we consider the non- 
linear case. 
Denote by F(,) (x) the distribution function of the r.v. 


X) =w—g—X+7(X), 


the wealth of the individual under the choice of a payment function r(x). Our goal is to 
find a function r for which F(,) is the best. More rigorously, we a looking for a function r* 
which maximizes the function Q(r) = U (Fip). 

It is remarkable that under certain conditions, the optimal payment function does not 
depend on the particular form of the utility function u(x) and on the premium g. More 
precisely, r* has the type (5.1.3) with the deductible d specified in (5.1.4). 

Let us state it rigorously. For a fixed A € (0,m], consider the set of all function r(x) 
satisfying (5.1.2), that is, the set 


Ry = (r(x) E{r(X)} =A}. 


5. Optimal Payment from the Standpoint of an Insured 131 


The theorem below belongs to K.Arrow; see, e.g., [5], [7]. 


Theorem 13 Let u(x) in (5.1.5) be concave, and r* (x) = ra(x), where d satisfies (5.1.4). 
Then for any function r(x) from Ry, 


Q(r)<Q(r*). (5.1.6) 


The optimal payment is the same for any concave utility function. | 


EXAMPLE 4. Two people facing the same loss X but having different utility functions, say 
/x and lnx, will prefer the same deductible policy and with the same deductible, provided 
that they choose the same mean payment À. 


So, 


Proof of Theorem 13. As we know from Calculus, the 1.-h.s. of (5.1.4), as a function of 
d, is continuous. By the general formula (0.2.2.2), this function is equal to m at d = 0, and 
it converges to zero as d — œ. Hence, for any À € (0,m], there exists a number d for which 
the 1.-h.s. of (5.1.4), that is, E{rg(X)}, is equal to À. 

In particular, this means that rg E Ry. 

(As a matter of fact, since m > 0, the number d for which (5.1.4) is true, is unique. 
Indeed, let x9 be the smallest point at which Fo(x) = 1. If Fo(x) < 1 for all x’s, we set 
xo = œ. Because m > 0, the point x9 > 0. Then Fo(x) < 1 for all x < x9, and 1 — Fo(x) > 0 
on [0,xo). Consequently, the 1.-h.s. of (5.1.4) is strictly decreasing for d € [0,x0).) 

Now, let us fix r € Ry and set F(x) = Fip (x). Denote by F*(x) the d.f. of the rv. X* = 
Xm)» i.e., the final wealth in the case (5.1.3) with d satisfying (5.1.4). Seta =w—g—d, 
andb=w-—g. 

Note that for both d.f.’s, F(b) = F*(b) = 1. Since P(X* < 
a) =0, and P(X* =a) = P(X > d), the d.f. F* (x) =0 for x < 
a, and has a jump of P(X > d) at the point a; see Fig.18. For 
x >a, we have F* (x) = P(w — g — X < x) because r(X) =0 
for X <d. 

On the other hand, F(x) = P(w — g — X +r(X) < x), and 
since r(X) > 0, we have 


FIGURE 18. 


F(x) < F* (x) for x >a. (5.1.7) 
A typical picture is given in Fig.18. 


We need also the following relations. Because E{r(X)} = E{r*(X)}, we have E{X(,)} = 
E{X*}. Therefore, integrating by parts, we get that 


a [F* (z) — F (z)]dz = fa [F(z) —F*(z)] = E{r(X)}— E{r*(X)}=0. (5.1.8) 


(How to prove that lim,,—..z|[F(z) — F*(z)] = 0 is shown in Section 0.2.5.) From (5.1.8) 
it follows that {*,, [F*(z) —F(z)|dz = ete [F*(z) — F(z)|dz for x < b. Then, in view of 


132 1. COMPARISON OF RANDOM VARIABLES 


(5.1.7), fora < x < b, 3 
f [F*(z) — F (2)]dz < 0. (5.1.9) 


On the other hand, since F* (x) = 0 for x < a, inequality (5.1.9) is true for x < a also, and 
hence it is true for all x < b. Note also that, since F*(z) — F(z) = 1 — 1 = 0 for x > b, 
eventually (5.1.9) is true for all x’s. 

Let us proceed to a direct proof. Assuming for simplicity that u is sufficiently smooth, 
integrating by parts, and taking into account that F*(b) = F (b), we have 


b 


U -UP) = | ” u(x)d[F*(x) —F()] = J FO- Fe eax 


T f doa (L FO-F'Ow). (5.1.10) 


Making use of (5.1.8), we integrate by parts in (5.1.10) one more time, which leads to 


U(F*)—U(F) -f ts F'@)-F@)ldr) ul (x)dx. (5.1.11) 


Since u” (x) < 0, (5.1.11) and (5.1.9) implies that U(F*) > U (F). E 


Route 1 = page 141 


5.2 A generalization 


Now, let us realize that (5.1.9) means that F* dominates F in the sense of the second 
stochastic dominance (SSD); see Section 3.5.3. Consequently, we have proved above the 
following much more general theorem. 


Theorem 14 The payment function r*(x) is optimal for any preference order X, which 
is monotone with respect to the SSD. 


EXAMPLE 1. Consider the case of implicit utility described in Section 4.3. Let v(x,y) 
be a given function as it is defined in this section, and let the preference order be preserved 
by the certainty equivalent c(F) defined in (4.3.3). Assume that v(x,y) is concave with 
respect to x and non-increasing in y. We show that the corresponding preference order is 
monotone with respect to the SSD. 

To simplify calculations, assume that v(x, y) is sufficiently smooth. For two distributions, 
F and G, similarly to (3.5.8), we have 


f væar -caa | ( [ Fo- Galaz) Buda 


—oo —oo 


where 


5. Optimal Payment from the Standpoint of an Insured 133 


Consequently, if v, < 0 and F dominates G in the sense of the SSD, then 


xX — 


i: “ v(x, c)d[F (x) ~G(x)] >0 (5.2.1) 


for all c. Let c =c(F). Then [™ v(x,c(F))dF (x) = 0, and hence [™, v(x,c(F))dG(x) < 0. 
On the other hand, by definition, f” v(x,c(G))dG(x) = 0. Since v(x,y) is non-increasing 
in y, from this it easily follows that c(G) < c(F), and hence F = G. 


5.3 Historical remarks regarding the whole chapter 


The remarks below are far from being comprehensive. The modern theory of comparing 
risky alternatives was established by J. von Neumann and O. Morgenstern in [96]. A rather 
complete exposition of this theory and generalizations may be found in [82] by R.D. Luce 
and H. Raifa. 


There is a very rich literature on further developments of this theory. Many of them are 
reflected in the monographs by M. Denuit, J. Dhaene, M. Goovaerts and P. Kaas [33], P. 
Fishburn [39], R.D. Luce [83], J. Quiggin [106], P. Wakker [140]. The reader can find there 
also historical notes and a rich bibliography. A bibliography with comments may be found 
also in P.Wakker’s web-site http://people.few.eur.nl/wakker/refs/webrfrncs.doc 


To the author’s knowledge, the weighted utility concept was first considered by S.H. 
Chew and K.R. MacCrimmon (1979, [25], [26]) and H. BiihImann (1980, [21]). The theory 
with corresponding axioms was later developed by S.H. Chew (1989, [24]). Other axioms 
leading to close criteria were considered by P. Fishburn (1988, [39]). 


The implicit or comparative utility and the betweenness axiom or axioms quite similar to 
it were suggested and explored independently by A. Ya. Kiruta (1980, [73]) , S.A. Smolyak 
(1983, [133]; see also [134]) and E. Dekel (1986, [32]). Various results on criteria close to 
comparative utility were considered in the mentioned monograph by P. Fishburn [39], and 
on the betweenness axiom—in the mentioned S.H. Chew’s paper [24]. 


The full RDEU model, including a set of axioms, was first suggested by J. Quiggin 
[107], though some earlier works of J. Quiggin had already contained some relevant ideas; 
see references in J. Quiggin [106]. Some models including weighting functions, say, in 
the case of binary gambles were considered earlier in the prospect theory of D. Kahneman 
and A. Tversky [68]. A special case of RDEU was independently considered in the “dual 
model” of M. Yaari (1987, [146]) and developed further by A. Roell (1987, [114]). To the 
author’s knowledge, an axiomatic system for the most general case including continuous 
distributions, was considered by P. Wakker (1994, [140]). 


The idea of the above proof of Theorem 13 and Theorem 14 belongs to C. Gollier and 
H. Schlesinger; see [48] and also [125]. Other generalizations of Arrow’s theorem (mostly 
using optimization technique) may be found, e.g., in papers by E. Karni [71], A. Raviv 
[109], and I. Zilcha and S.H. Chew [149]. See also references therein. 


134 


1. COMPARISON OF RANDOM VARIABLES 


6 EXERCISES 


Sections I and 2 


a 


. Make sure that you indeed understand why if E{X } = m and Var{X } = 0? £0, then for the 


normalized r.v. X* = (X —m)/o, we have E{X*} =0, Var{X*} =1. 


. Find the 0.2-quantile of a rv. X taking values 0,3,7 with probabilities 0.1,0.3, 0.6, respec- 


tively. 


. This exercise concerns the VaR criterion with a parameter y. R.v.’s X with or without indices 


correspond to an income. 


(a) Let X; take on values 0,1,2,3 with probabilities 0.1, 0.3, 0.4, 0.2, respectively, and 
X take on the same values with probabilities 0.1, 0.4, 0.2, 0.3. Find all y’s for which 
X = Xp. 

(b) Let r.v. Xı be uniform on [0,1], and X be exponential with E{&)} = m. When is the 
relation Xz % X; true for all ys? Let m = 1/2. Find all y's for which X; = X2. 


(c) Let rv’s Xj = 1 — 1, X2 = 1 — &2, where the loss &; is uniform on [0,1], and &2 
is exponential with E{€,} = m. When is the relation X; = X true for all y ? Let 
m = 1/2. Find all y’s for which X = X2. (Advice: Compare just &’s and observe that 
(X) = 1—qi-y(§)-) 

(d) Let X; be uniform on [0,3], and let X) be uniform on [1,2]. Find all y’s for which 
XxX. 


Solve Exercises 3a,d for the case of the Tail VaR criterion. 


. It is known that if X1,...,.X, are independent exponential r.v.’s with unit means, then S, = 


Xı +... +Xn has the T-distribution with parameters (1,7). We will prove it in Section 2.2.2. 


Let n = 10, and let r.v.’s X1,...,X19 be defined as above. Suppose that Y; = 1.1-X; for i = 
1,...,10, represent the returns for the investments in 10 independent assets. (For the notion 
of “return” see Example 1.2.2-4.) Thus, since E{X;} = 1, an investment into each asset 
gives 10% profit on the average. Proceeding from the VaR criterion, figure out what is more 
profitable: to invest $10 into one asset, or split it between 10 assets. You are recommended 
to use Excel (or another software); the corresponding command in Excel for quantiles is 
=GAMMAINV(p,v, 1 /a) where a is the scale parameter. 


. Consider two assets. The investment of a unit of money into the ith asset leads to an in- 


come of 1+; units; i= 1,2. Assume that &’s have a joint normal distribution, E{€;} = 0, 
Corr{§1,62} =p. Suppose that you invest some money into each asset, K; is the profit of the 
investment into the ith asset, and K is the total profit. Prove that 


g(K) = yh (Ki) + 45(Ka) + 2pqy(Ki)ay(K2). 


The last formula is relevant to the JP Morgan RiskMetics’™” methodology; see, e.g. [64]. 
Some references may be found also in http:/Avww.riskmetrics.com and 
http://www. gloriamundi.org. 


(a) Proceeding from (1.2.4) and using integration by parts, prove that E{X |X < t} = 
t t 
FO (ro a Fax) =t- Fa F (x)dx, where F (x) is the d.f. of X. 


8. 


9. 


10.** 


11. 


12. 
13. 


14. 


6. Exercises 135 


(b) Show that if F (t) is continuous and qy is the y-quantile of X, then 


qy 
1 
Har |X < qy} =4)~ f Fdz. (6.1) 


Show that it may be false if X is not a continuous r.v. 


(c 


wm 


Show that Vaii (X) is monotone in the class of continuous r.v.’s. (Advice: You can either 
use (6.1) or observe that in the continuity case, the conditional d.f. P(X <x|X < qy) = 
E(x) for x < qy and = 1 otherwise.) 


(d 


© 


Sketch a typical graph of a continuous F (x). Consider y for which qy < 0 and point out 
the region in the graph whose area equals Y |Vai (X)|- 


Take real data on the daily stock prices for the stocks of two companies for one year from, 
say, http://finance.yahoo.com or another similar site. For different values of y, compare the 
performance of the companies using the VaR and the Tail VaR criteria. (The absolute values 
of the prices should have no effect on results. The analysis should be based on returns, that 
is, on the ratios of the prices on the current and the previous days. For the notion of “return” 
see Example 1.2.2-4.) Estimate the mean return for each company. Try to characterize 
and compare the performance of the companies, taking into account all three characteristics 
mentioned. 


If we interpret X as an income, then the r.v. X=-Xx may be interpreted as a loss. Considering 
only r.v.’s whose d.f.’s are strictly increasing, do the following. 


(a) Prove that gy(X) = —qi-y(X ). Show it graphically. 

(b) Consider, instead of (1.2.3), the function V(X ;s)=E {xX |x > s}, the mean value of 
the loss given that it has exceeded a level s. Show that V(X; s) = —V(X; —s) for any 
s, and V(X; s) = |V (X; —s)| for all s > 0. Give a heuristic explanation. 

(c) Consider the criterion preserved by the risk measure Vaii (X )=E{X |X>q1 (X )}. Show 
that X ZY XY © Vai(X) < Vail (V). 


For some V(X) and a family P, suppose that (1.3.2) is true. Show that the monotonicity 
property and Properties I-III from Section 1.3 are fulfilled. (The converse assertion is more 
difficult to prove, but the sufficiency of the representation (1.3.2) is understandable. Recall 
that min, (f(x) +g(x)) > min, f(x) + miny g(x).) 


Section 3 


Show that if u(x) is strictly increasing, then the rule (3.1.2) is strictly monotone. (Advice: 
Consider E{u(X)}—E{u(Y)} = E{u(X)—u(Y)}.) 


Graph all utility functions from Section 3.1.3. 


Consider a r.v. X such that P(X > x) =x! for x > 1. This is a particular case of the Pareto 
distribution which we will consider in Section 2.1.1.3 in detail. Does X have a finite expected 
value? Find the certainty equivalent of X for u(x) = \/x. 


Prove that the Masset criterion (3.1.4), whatever the parameter a is, positive or negative, has 
the following properties. 


(a) (Additivity property.) For any two independent r.v. X1 ,X2, it is true that C(X, + X2) = 
C(X1) + C(X2). Interpret this, viewing X1,X2 as the results of two independent invest- 
ments. 


136 


15. 
16. 


17. 


18. 


19. 


20. 


21. 


22. 


23. 


1. COMPARISON OF RANDOM VARIABLES 


(b) (An analog of Proposition 2: the independence from the initial wealth.) If c(X,) > 
c(X2), then c(w +X1) > c(w+X2) for any w. Give an economic interpretation. 


Write formulas for the certainty equivalents for the cases 2, 3, 5 from Section 3.1.3. 


Let X be exponential, and u(x) = —e* (see Section 3.1.3). Show that for the certainty 
equivalent c(X), we have ry] — las E{X} —> 0, that is, c(X) ~ E{X} if E{X} is small. 


Interpret it. (Advice: Use In(1 +x) =x+o(x).) 


Repeat calculations of Example 3.2-1 for the case when € = 0 with probability 0.9, and is 
uniformly distributed on [0, 1] with probability 0.1. 


An EUM customer of an insurance company has a total wealth of 100 (in some units) and 
is facing a random loss € distributed as follows: P(E = 0) = 0.9, P(E = 50) = 0.05, P(E = 
100) = 0.05. 


(a) Let the utility function of the customer be u(x) = x — 0.005x? for 0 < x < 100. Graph 
it. Is the customer a risk averter? 


(b) What would you say in the case u(x) = x +0.005x?? 


(c) For the case 18a, find the maximal premium the customer would be willing to pay to 
insure his wealth against the loss mentioned. First, set the equation clearly and explain 
it, then solve. Is the premium you found greater or less than E{X }? Might you predict 
it from the beginning? 


(d) Find the minimal premium which an insurance company would accept to cover the 
risk mentioned, if the company’s preferences are characterized by the utility function 
u(x) = /x, and the company takes 300 as an initial wealth. Is the premium you found 
greater or less than E{&}. Might you predict it? 


(e) Solve Exercise 18d for the case when the r.v. € is uniformly distributed on [0, 100]. 


(f) Solve Exercise 18c for u(x) = 200x — x? +349. (Advice: Look at this function atten- 
tively before starting calculations.) 


Find the maximal premium the customer would be willing to pay to insure half of the loss in 
the situations of Example 3.2-1. 


Give an explicit example when the maximal acceptable premium for a customer does depend 
on the initial wealth. (Advice: You may take u(x) = \/x and a r.v. assuming two values.) 


Take real data on the daily stock prices for the stocks of two companies for one year from, 
say, http://finance.yahoo.com or another similar site. Considering a particular utility func- 
tion, for instance, u(x) = —e~}* for some B, determine which company is better for an EU 
maximizer with this utility function. (Advice: Look at the comment in Exercise 8. To esti- 
mate the expected value E {u(X )}, where X is a random return, we can use the usual estimate 


1 
—|E{u(X,)}+...+ E{u(X,)}], where {X1,..., Xn} is the time series based on the data. Ex- 
n 


cel is convenient for such calculations.) Add to your analysis the characteristics considered 
in Exercise 8. Try to describe the performance of the companies, taking into account all 
characteristics you computed. 


Is Condition Z from Section 3.4.1 a requirement on (a) distribution functions, or (b) random 
variables, or (c) the preference order under consideration? 


Is Condition Z from Section 3.4.1 based on the concept of expected utility? 


24. 


25. 


26. 
27. 


28. 


29. 


30. 


31. 


32. 
33. 


34. 
35.* 


36.* 


6. Exercises 137 


Is it true that for an expected utility maximizer to be a risk averter, his/her utility function 
should have a negative second derivative? Give an example. (Advice: Look up the definition 
of concavity.) 


Let u(x) = x for x € [0,1], and u(x) = 5 + x for x > 1. Is an EU maximizer with this utility 
function a risk averter? 


Check for risk aversion the criteria with utility functions from Section 3.1.3. 


Let the utility function of a person be u(x) = e, a > 0. Graph it. Is the person a risk averter 
or a risk lover? Show that in this case, the comparison of risky alternatives does not depend 
on the initial wealth. 


Let X be exponential, and u(x) = —e~#* (see Section 3.1.3). Show that the certainty equiva- 
lent c(X) — 0, as B — œ. Interpret it in terms of risk aversion. 


In Examples 3.1.3-1 and 2, compare the expected values and the certainty equivalents for 
different values of & including the case œ — 0. Interpret results. 


Let u (x) and u2(x) be John’s and Mary’s utility functions, respectively. 


(a) How do John’s and Mary’s preferences differ if (i) w(x) = 2u2(x) + 3; 
(ii) uy (x) = —2u2 (x) +3. 


(b) Let u(x) = /x and uz(x) = x!/3-. Who is more averse to risk? In what sense? 


(c) Let a; (x) = —1/./x and u(x) = —1 /x!/3. Who is more afraid of being ruined (having 
a zero or negative income)? Who is more averse to risk? 


Let Michael be an EUM with the utility function u(x) = —exp{—Bx} and B = 0.001. (For 
this value of B, the values of expected utility in this problem will be in a natural scale.) 
Michael compares stocks of two mutual funds. The today price for each is $100 per share. 
Michael believes that in a year the price for the first stock will be on the average either 10% 
higher or 10% lower with equal probabilities, while for the second stock 10% up or down 
should be replaced by a slightly higher figure, approximately, 11%. (a) Which mutual fund is 
“better” for Michael? Do we need to calculate something? (b) Now, assume that the second 
fund invites all people who buy 100 shares to a dinner valued at $k. Which k would make 
difference? 


Provide calculations to obtain (3.2.6). 


(a) Itis known that for any positive r.v. X, the function n(s) = (E{X*})!/* is non-decreasing 
in s. Using this fact, prove that the r.-h.s. of (3.2.5) is non-decreasing in B. (Advice: 
Set € = In(X) and write X = é.) 

(b) Using Jensen’s inequality, prove that indeed n(s) above is non-decreasing. (Advice: We 
should prove that if s < t, then n(s) < n(t), which is equivalent to E{X*} < (E{X‘})*/". 
Write X*° = (X‘)’/t = u(X‘), where u(x) = x°. Figure out whether u(x) is concave for 
S<t.) 


Write the risk aversion function for the utility function (3.1.3) setting u1 (x) = lnx. 


Let R(x) and R,(x) be absolute and relative risk aversion functions for a utility function u(x). 
Find the corresponding risk aversion functions for u* (x) = bu(x) +a, where a, b are constants. 
Interpret the result in the light of the first property from Section 3.1.2. 


Find the absolute risk aversion for u(x) = e8* B > 0. Interpret the fact that the risk aversion 
characteristic is negative. 


138 


37. 


x 


38. 


39. 


40. 


41. 


42. 


43. 
44, 


45. 


46. 


x 


x 


x 


x 


* 


1. COMPARISON OF RANDOM VARIABLES 


Prove that, up to linear transformation, only exponential utility functions have a constant 
absolute risk aversion, and only power utility functions and the logarithm have a constant 
relative risk aversion. (Advice: You should consider equations u” = cu’, and xu” (x) =cu'(x).) 


Consider two r.v.’s both taking values 1,2,3,4. For the first r.v. the respective probabilities 
are 0.1, 0.2, 0.5, 0.2, for the second, 0.1, 0.3, 0.3, 0.3. Which r.v. is better in the EUM-risk- 
aversion case? Justify the answer. 


Consider ar.v. taking values x1 ,x2,... with probabilities pı, p2,..., respectively. Let the x;’s be 
equally spaced, that is, x;; — x; equals the same number for all i. Assume that for a particular 
i , we replaced probabilities pj_1, pi, pi+ı by probabilities p;_; +A, pj—2A, pi+1 +A where 
a positive A < p;/2. Has the new distribution become worse or better in the EUM-risk- 
aversion case? Justify the answer. 


Consider two distributions, F; and F2. The distribution Fj is uniform on [—1, 1]. The distribu- 
tion F) is the triangular distribution on the same interval. More precisely, the corresponding 
density 
x+lif -—l<x<0, 
fo(x) = 4 l—-xif -O<x<1, 
0 otherwise. 


Let fi (x) be the density of the first distribution. 


Graph the densities fı (x) and f(x) in the same system of coordinates. Guess which distri- 
bution is better in the EUM-risk-aversion case, and give heuristic arguments in favor of your 
guess. Prove your statement rigourously. Point out a similarity between this exercise and 
Exercises 38-39. 


John agreed with the following strange payment for a job done. A regular coin will be tossed. 
In the case of a head, John will be paid $200. In the case of a tail, a die will be rolled. If the 
die shows one or two, John will be paid $300, otherwise — nothing. Determine the random 
payment considering the mixture of the distributions of r.v.’s Xı = $200, and X2 = 300 or 0 
with probabilities 1/3 and 2/3, respectively. 


(a) Is the FSD rule defined in Section 3.5.2 a requirement on distributions or on the pref- 
erence order? 


(b) Does the FSD rule concern only the EUM criterion or all preference orders? 


(c) Answer the same questions regarding the SSD. 
Which criteria from Section 1.2 satisfy the FSD rule? 


Show rigorously that in the general A-scheme from Section 3.5.5, (a) equivalence sets are 
planes or hyper-planes; (b) to determine completely the preference order, it suffices to deter- 
mine n — | equivalent points. 


Fig.’s 19abcd depict equivalence curves in the A-scheme for the distributions of r.v.’s taking 
three values. Which figures correspond to EUM, and which do not? 


Consider the distributions of all random variables taking only values 0, 10, 20. We identify 
any such a distribution F with the vector of probabilities (p1, p2, p3). Let F; = (0.1, 0.5, 0.4), 
Fy = (0.2, 0.2, 0.6), and F3 = (0.1, 0.8, 0.1). 


(a) Mark all corresponding points in the (p1, p3)-plane picture. 


(b) Find a distribution F4 such that the assertion F; ~ Fo, F3 ~ F4 would not contradict 
the independence axiom. 


6. Exercises 139 


FIGURE 19. 


47.* Consider the probability distributions of ALL random variables taking values 0, 10,20, and 
40. 


(a) Identify such distributions with points in R?. What region in R? do we consider in this 
case? To what distributions do boundary points of this region correspond? 


(b) Assume that you are an EU maximizer. Fix a distribution Fo, and consider ALL distri- 
butions F ~ Fo. Where do all points corresponding to these distributions lie? 
(c) Assume that the following three distributions are equivalent: 


(0.2, 0.3, 0.1, 0.4), (0.3, 0,0, 0.7), (0, 0.6, 0.4,0). Find one more distribution equiv- 
alent to the mentioned. 


48.* Consider the distributions from Exercise 46. You are an EUM, and your utility function is 
increasing. What is better, F} or F3 ? Find G; = (Fi +F), G2 = 5(F3 + F2). Mark both 
points in the A-scheme picture. Which point is better? 


49.* Consider an EUM in the situation of Exercise 46. Assume that distributions 
(0.1, 0.5,0.4) and (0.2, 0.3, 0.5) are equivalent. Find one more distribution which is equiv- 
alent to these distribution. Find all such distributions. 


50.* You know that Fred is an EUM, and for him “the larger, the better”. You ask Fred to compare 
the following two random variables: both take on values 0,20,40; the first—with probabili- 
ties G, 7 1), respectively, and the second—with probabilities G, és 5): It turns out that Fred 
considers these distributions equivalent for him. 


(a) Is this information enough in order to predict the result of comparison by Fred of ANY 
two random variables with the values mentioned? 


(b) Which distribution would Fred prefer: (4, 1,1) or (3, +, 2)? 


2494 5° 10? 10 
(c) Is Fred a risk lover? If yes, justify the answer. If no, does this mean that he is a risk 
averter? 
Section 4* 


51. Consider Fig.’s 19abcd from Exercise 45 depicting equivalence curves in the A-scheme for 
the distributions of r.v.’s taking three values. Which figures correspond to axioms you know. 
Identify these axioms. 


140 


52; 


53; 


54. 


55; 


56. 


57. 


58. 


59. 


60. 


6l. 


62.* 


1. COMPARISON OF RANDOM VARIABLES 


Consider distributions F,, F2, and F; from Exercise 46. Find a non-trivial and meaningful 
example of a distribution F4 such that the relations Fj ~ Fo, F> ~ F4 would not contradict 
the betweenness axiom. 


Consider probability distribution of the income taking values 0,20, and 40 with probabilities 
P1,P2,P3, respectively. Specify all points in the plane (p;,p3), which correspond to the 
distributions under consideration. 


Assume that for Jane equivalence curves on the diagram mentioned are curves given by the 
formula 
p3=c+e?!, where —e<c<0. 


(a) Graph these curves, and realize why we need the condition —e < c <0. 
(b) Is Jane a EUM? Do her preferences meet the Betweenness Axiom? 


(c) Since we consider only the narrow class of distributions concentrated on the set 0, 20, 40, 
we cannot figure out whether Jane is a risk lover or not. Nevertheless, reasoning heuris- 
tically, give an argument that we should not expect that Jane is a risk lover. 


(d) Jane follows the rule “the larger the better”. Which distribution is better for her: 
(0.1, 0.5,0.4) or (0.2, 0.2, 0.6) ? 


In Example 4.3.1-3, find the certainty equivalent of a r.v. X = d > 0 or 0 with probabilities p 
and q, respectively. Analyze and interpret the case p close to one. 


Let g(s) defined in Section 4.3 equal 1 — e~* for s > 0, and s for s < 0. Graph g(s). Let c be 
the certainty equivalent of a r.v. X. Is it true that c < E{X} ? Estimate the certainty equivalent 
for X equal to 1 or 0 with equal probabilities. 


Show that all lines (4.2.8) intersect at one point. (Advice: Consider two lines, lı and h, 
corresponding to two different values dı and dy. They may intersect only at a point in lo 
where W (F) is not defined since otherwise at this point W (F) would have taken two different 
values.) 


Show that W(F)) in (4.2.6) is not equal, in general, to aW(F,) + (1 — a)W (F>). (Hint: 
The answer does not require long calculations. Consider, for example a = 0.5 and the case 
[.w(a)d Fy (x) = 2 fS w(x) dF (x).) 

In the situation of Example 4.3.1-2, find the maximal premium Gmax assuming w = d and the 
loss to have the same distribution as X in this example. 


Let an investor follow the RDEU criterion with ¥(p) = 1 — (1 — p)?. In other words, ¥ 
behaves as a power function for p close to one, that is, for p’s corresponding to large values 
of r.v.’s. Let X be an exponential rv. with a distribution F. Show that in this case, the 
transformation of F corresponds to dividing X by B. 


Find Gmax in the situation of Example 4.4.2-1 in the case when P(§ = 0) = 0.9, P(§ = 1) = 
0.1. 
Section 5 


Find the deductible d in (5.1.4) for 4 = m/2 for two cases: (a) X is exponential, E{X } = m; 
(b) X is uniform on [0,2m]. (So, the expected values are the same.) Compare results. Explain 
the difference from a heuristic point of view. 


Which regions in Fig.18 have the same area? 


Chapter 2 


An Individual Risk Model for a Short 
Period 


This chapter and Chapter 3 are devoted to two connected topics. First, we explore the 
structure of the payments an insurance organization provides during a given time period. 
Secondly, we consider the solvency of the insurance process or, more specifically, the size 
of premiums sufficient for the insurance organization to meet its obligations. 

We begin with one insured unit. It may be an individual or an organization; we will 
use the terms “client” or “insured”. In this case, our goal is to explore the probability 
distribution of the payment of the company to a particular insured. 

Next we consider the aggregate claim coming from many clients. Here, the most im- 
portant problem consists in approximating the distribution of the aggregate claim in the 
case when the number of clients is large. “Good” approximations will allow us to estimate 
premiums acceptable for the insurer. 

Chapter 3 concerns the portfolio of risks as a whole. In this case, the objects of study are 
the random number of claims coming into the insurance organization, the total payment the 
organization should provide, and again acceptable premiums. 

In both chapters, we consider only a single and sufficiently short period of time, and 
all models in these chapters are static. Once we have studied how the company functions 
during a short period, we will proceed in Chapter 4 to insurance processes which run over 
extended time periods; that is, to dynamic models. 


1 THE DISTRIBUTION OF AN INDIVIDUAL PAYMENT 
1.1 The distribution of the loss given that it has occurred 
1.1.1 Characterization of tails 

Let us consider one client. If the client suffers damages potentially covered by the insur- 
ance contract, we will talk about a loss event. Denote by € the r.v. of the loss given that the 
loss event occurred and by q the probability of the loss event. Usually although not always, 
q is small. Regarding the r.v. €, the term severity is also frequently used. 

Thus, the real loss of the insured is the r.v. 


-a E with probability q, (1.1.1) 
0 with probability 1 — q. 


141 


142 2. AN INDIVIDUAL RISK MODEL 


This may be rewritten as 
X=, (1.1.2) 


where the indicator of the loss event 


_ J 1 with probability q, 
-~ \ 0 with probability 1 — q. 


In this section, we do not touch on the timing of the payments issue, viewing € as the 
total amount of losses during the period under consideration. So, I is the indicator that at 
least one loss event has occurred. 

If the insurance contract covers the whole possible loss, the amount paid by the insurance 
organization equals X. If the damage is not covered in full, the amount paid is a part of X. 

This subsection concerns merely the distribution of &. Before classifying possible types 
of this distribution, we first recall 


The exponential distribution which, in a certain sense, may be viewed as a key case. 
As was defined in Section 0.3.2.2, this is the distribution of a positive r.v. € = a with the 
density 


0 ifx <0, 
fal) = ee ifx> 0. he? 


The parameter a is positive and plays the role of a scale parameter. If a r.v. &ı has the 
density fı (x), then the rv. & = &)/a has density f,(x)(see Section 0.3.2.2 and Exercise 1). 
The corresponding distribution function (d.f.) 


0 ifx <0, 
F,(x) = { eee ere (1.1.4) 
the tail, i.e., 
Pieyre (1.1.5) 
and 
E{&a} =1/a, Var{Ea} =1/a’. (1.1.6) 


(See Section 0.3.2.2 for detail.) 
In Section 0.3.2.2, we have also shown that the exponential distribution has the unique 


Lack-of-Memory (or Memoryless) Property: for any x,y > 0, 


P(G&>x+yl|§>x)=P(§E>Y), (1.1.7) 
where & = €, is defined above. 


Assume that we have preliminary information that the loss has exceeded a level x. Then 
the 1.-h.s. of (1.1.7) is the conditional probability that the real loss will be larger than x by y. 
If the memoryless property holds, this probability does not at all depend on the value of x. 

We may interpret (1.1.7) in a similar way in the case when € is not the size of a loss but 
the duration of a process; for example, the duration of a job to be done or the time between 
two consecutive claims arriving at a company. Such examples will turn out to be important 
when we deal with dynamic models. 


1. The Distribution of an Individual Payment 143 


Assume that a process, a job for example, has already lasted x hours. Then the |.-h.s. of 
(1.1.7) is the conditional probability that the job will last y hours more. In the case of the 
memoryless property, this conditional probability does not depend on when the job began. 
One may view the situation as if at each moment the process starts over as from the very 
beginning—-so to say, the system “does not remember” what happened before. 

Certainly, the property under discussion is very special, and it is important to keep in 
mind that 


The exponential distribution is the only distribution with the memoryless property. 


A not difficult although non-trivial proof may be found in many textbooks on Probability 
(see, e.g., [38], [116], [120]). 

For other distributions, P(§ > x+y|& > x) depends on x. It is especially important to 
consider P(E > x+y|€ > x) for large x’s or putting it in another way, for large deviations. 
Formally, we let x — œ% and set 


PE>xty,§>4) im 
PE>x) mie P(E >x) 


QQ) = lim P(E > x+y|§ > x) = lim 
Consider three typical situations. 


a) P(E > x) ~ Cx, as x — 0, for some constants C and « > 0.! Then 


OL 


Q(y) = lim 


poe Fy)” =] for any y. 


= lim 
X—}00 


ee ea 
(1+y/x)* 


It may be interpreted as follows. If we have information that € has exceeded a large 
level x, then with large probability the surplus € — x will be also large. For instance, 
in the job example, it would mean that if the job has lasted for a long time, then we 
should not expect that it will be over soon. 


We mentioned already that the probability P(& > x) for large x’s is often called the 
tail of the distribution. The particular case above is classified as that of a heavy tail. 
Later we will consider a general definition. 


b 


wm 


P(E >x) ~ Ce, as x — œ, for some constant C and a > 0. In this case, the tail is 
asymptotically exponential for large values of x, and 


—a(x+y) 
Q(y) = lim á =e, 


x00 eT% 


Thus, the conditional distribution, at least asymptotically, for large x’s, is exponential. 


'The symbol u(x) ~ v(x) means that a — 1, that is, u(x) and v(x) are close for large x’s. 


144 2. AN INDIVIDUAL RISK MODEL 


c) P(&>x)~ Ce~“", where C and a are positive and y> 1. In this case, the tail vanishes 
faster than any exponential function, and 


Q(y) = lim exp{—a[(x+y)’—x"]} =0 forany y>0. 


> (To show this, first note that (1 +z)’—1 ~ yzas z > 0, which may be verified, for 
example, by L Hépital’s rule. Then, 


(x+y) = x[(1+2)¥- 1] ~ ay =a! yy > 0, as x > o, for any y > 0 and 
y>1.< 


In this case, if we have information that & has exceeded a given large value of x, this 
means that with high probability the real value of € is close to x. In the job example, 
this means that if the job has lasted for a long time, we expect that it will end soon. 


The last two cases are classified as those of light tails. 


Let us turn to a general classification. We restrict ourselves to positive r.v.’s. For the 
distribution F of a r.v. &, set F(x) = 1 — F (x) = P(§ > x), the tail of F. 
A distribution F is said to be light-tailed if for some positive c and B, 


F(x) < Be (1.1.8) 


for sufficiently large x’s, more precisely for all x > xeg, where xcg is some number perhaps 
depending on c and B. 

Thus, we are interested in the behavior of the tail F (x) for large x. The significance of the 
definition above is that P(E > x) — 0, as x — ©, as an exponential tail (with the parameter 
c) or faster. 

Note also that the constant B is involved in (1.1.8) merely to make the verification of the 
condition easier. As a matter of fact, without loss of generality, we can set B = 1. 

> Indeed, if (1.1.8) holds, then F(x) < Be~® = (Be~®/?)e~°/?,, The function Be~“/? > 
0 as x > 0, so for sufficiently large x’s, we have Be~&/2 < l and F(x) < e7/2. that is, 
(1.1.8) is true for B = 1 and c replaced by c/2. < 

If there is no c and B for which the above property is true, we call the distribution heavy- 
tailed. 


S.Asmussen cites in [10] an actuarial folklore definition of a heavy-tailed distribution F of a r.v. 
& as that for which “20% of the number of claims account for more than 80% of the total value of 
the claims”. One may clarify it as follows. Let q = qo.g(&), the 0.8-quantile of & (for the definition 
of quantile see Section 0.1.3.4). Then, if we consider a large number of independent claims with 
the same distribution F, asymptotically, 20% of them will be larger than q. The “part of the mean 
value” for x > q is m(q) = fọ xdF (x). The above heuristic definition means that [m(q)/m(0)| > 0.8, 
where m(0) = E{&}. 


We say that a tail G(x) is heavier than F (x) if 


=—— > 0 as x5 0, (1.1.9) 


In other words, F(x) is vanishing faster than G(x). 


1. The Distribution of an Individual Payment 145 


(For the reader who prefers the big O and little o notation introduced in Section 0.4.1, 
note that the definition (1.1.8) amounts to the relation F(x) = 0(e~™) for some c > 0, and 
the definition (1.1.9) may be rewritten as F(x) = 0(G(x)).) 


EXAMPLE 1. Let F(x) ~ x~? and G(x) ~ x7!. Clearly, F(x) — 0 faster than G(x). 
Formally, we have 


So, G(x) is heavier than F(x). 


1.1.2 Some particular light-tailed distributions 


—ax 
>, 


1. The exponential distribution itself is, obviously, light-tailed since e7% < e and 


hence (1.1.8) is true for c = a and B= 1. 


2. Distributions of bounded r.v.’s. Let a r.v. & be less than or equal to some constant b 
with probability one. Then F(x) = P(& > x) = 0 for any x > b, and (1.1.8) clearly 
holds for any non-negative c and B. 


Let & be an unbounded r.v. with a distribution G, that is, G(x) > 0 for all x. Then 
F = = 
an = 0 for x > b, and G(x) is heavier than F (x). 

x 


3. Mixtures of exponentials. (For the definition of mixture and a discussion, see Section 
0.1.3.5.) Let the distribution F = ria w;Fj, where k is an integer, weights wj are 
positive, Xi wj = 1, and F; is the exponential distribution with positive parameter 


aj. For example, F (x) = (1 — e™*) + 4 (1 — e™™). We can write 


k k 
F(x) = } wF = Y} wje. 
j=l j=l 


In our particular example, F(x) = $e“ + fe >". Then, setting c = min{a;}, we have 


k k 

F(x) < £ wje™™ = eo >: Wj = go: 
j=l j=l 

and (1.1.8) is true. In the example above, F (x) = ze * + e7% < e™*, and (1.1.8) is 

true for c =B=1. 


4. The T (gamma)-distribution. This is the distribution with the density given by 


aY 


sO for x > 0, (1.1.10) 


fay (x) = 


and fay(x) =0 for x < 0. Parameters a and v are positive. A detailed description is 
given in Section 0.3.2.3; for v = 1, (1.1.10) determines an exponential distribution. 


The I-distribution is light-tailed. 


146 


2. AN INDIVIDUAL RISK MODEL 


> To show this, let us write 


Fay =PE> = S fabis [Fay te ey 


=f [Ayre en]eana 
x T(v) ` 


For any a > 0, the function 
aY 
Ky) = ray * 79 as y— oo, (1.1.11) 


Then there exists a number d = d (a) such that K (y) < $ for all y > d(a). Then, for 
x >d(a), 
POS f K(y)e-®2dy < f sel dy = eel 
x x 
Thus, if we set c = a/2, then for all x > d(2c), 


F(x)<e™. < 


. The Weibull distribution. Another way to generalize the exponential distribution is 


to consider the tail 
F(x) =exp{—ax"} for x > 0, (1.1.12) 


where a and r are positive parameters. For r = 1, this is an exponential distribution. 
For r > 1, we deal with a light-tailed distribution. Suggestions for how to show it 
rigorously are given in Exercise 4. 


1.1.3 Some particular heavy-tailed distributions 


1. The log-normal distribution. This is the distribution of the r.v. € = e, where n is a 


normal r.v. In other words, € is log-normal if In€ is normal. This distribution appears 
in numerous applications in Economics, Physics, and other areas. Let a = E{n } and 
b? =Var{n}. Then, we can represent n by N = a +bno, where no is standard normal. 
Consequently, 

E = et tno | (1.1.13) 


Since the m.g.f. of No is My) (z) = e712 (see Section 0.4.3.6), 


E{E} = Efe} = e1 E {eh} = ey, lb) = et? (1.1.14 
Similarly, 
E{&} = E{e tmo] = eMo (2b) = e2(ath’) (1.1.15) 
and, hence, : ; ar 
Var{&} — er(atb ) — etb = e2atb (e? = 1). (1.1.16) 


The d.f. of & is 


F(x) = P(etthno <x) = P(no < [Inx—a]/b) = ®([Inx— aj/b), 


1. The Distribution of an Individual Payment 147 


where ®(-) is the standard normal d.f. Then the density 


d 1 
x) = —®((Inx—a)/b)) = ex Inx—a)*)/2b*}. 
f(x) = p ((Inx —a)/b)) = T— exp{—(Inx—a)")/26°} 
The log-normal distribution is heavy-tailed, that is, F (x) + 0, as x — oo, slower than 
any exponential function. 


> To prove this, we use the estimate (0.3.2.18). From the first inequality there, it 
follows that for x > 2 


1—®(x) > 2x 19(x) 


Since No is symmetric, without loss of generality we can set b > 0. Note also that for 
x > e~, we have —a < lnx, and ;(Inx —a) < 2 Inx. The last quantity is larger than 
2 for x > e?. Consider x > max{e~“,e? } (note that a may be negative). Then 


F(x) = 1- (Zn —a)) 5 1-0 (Zina) 2 eaves 


3b 2 3b 2 
= ——(Inx)~!exp 4 -< (Inx abs ex { Inx)? ininx 
lina) texp{ — Fons)? = exp | (ins) 


Then for any c > 0 and x > max{e~“,e?}, 


F (x) S 3 


b 
exp{—cx} ~ 82m 


It is a standard Calculus exercise to show that cx = (Inx)? —InInx — œ as x — o. 
Hence, 


2 
exp fox — pa lnx)? -Ininx : 


F(x) 
exp{—cx} 
This implies that there is no c,B > 0 for which (1.1.8) is true. < 


— œ, asx— œ, foranyc> 0. 


. The Pareto distribution. Consider a r.v. &1 for which P(&, > x) is the function 


F(x) z forx <1, (1.1.17) 


Pi) x“ forx > 1, 


where & > O is a parameter. Thus, in this case, the tail is vanishing as a power 
function. Since P(E; > 1) = 1, the r.v. €; takes on values from [1,°¢) with probability 
one. 


We call this distribution a Pareto distribution as well as the distributions of all linear 
transformations of &1; more specifically, the distributions of r.v.’s & = bë; +d for all 
d and b > 0. The parameter b may be viewed as a scale parameter. Since €; assumes 
values from [1,>°), the r.v. € takes on values from [b + d,oo), so b +d may be called 
a location parameter. 


Often, the term “Pareto distribution” is applied to the distribution with the tail 


a 
Fig) = (<5) for x > 0, (1.1.18) 


148 2. AN INDIVIDUAL RISK MODEL 


where the parameter O > 0. In Exercise 6f, we show that this is the distribution of the 
r.v. 0€; — 0 = 6(E€; — 1). In this case, 6 is a scale parameter. Indeed, if a r.v. Z; has 
distribution (1.1.18) with 6 = 1, then the r.v. Zg = 9Z, has distribution (1.1.18). 


Let F; (x) be the d.f. of €). Then its density fı (x) is given by 


ee PT for x <1, 
fi(x) =A) = Fit) = fant forx> 1. 


The pth moment of €; is 
E{er} =f X fi (x)dx = — (1.1.19) 
1 a—p 


if p <a. For p > a, the moment does not exist. The reader is encouraged to provide 
simple integration on her/his own. See also Exercise 6 for other questions about this 
distribution. 


Thus, for a r.v. € defined as bE; +d, the expectation E{&} exists if a > 1, and the 
variance Var{&} exists if æ > 2. In the last case, 


Var{&i} = E{&t} — (E{Ei})° = (1.1.20) 


104 
(a —2)(a— 1)? 


As we know from Calculus, any power function vanishes slower than any exponential 
function, which implies that for a Pareto distribution—whichever definition above we 
choose—there is no c for which (1.1.8) is true. 


The tail (1.1.17) may be interpreted as “very heavy”. In Exercise 6, we discuss how 


the degree of “heaviness” depends on Q. 


3. The Weibull distribution. For r < 1, the tail in (1.1.12) vanishes slower than any 
exponential function, so the distribution is heavy-tailed. See also Exercise 4 for 
detail. 


Route 1 = page 150 


1.1.4 The asymptotic behavior of tails and moments 


The facts considered in this subsection may help classify tails without direct calculations. 
We use below the Calculus notation 


f(x) = O(g(x)) and f(x) = o(8(x)), (1.1.21) 


where f(x) and g(x) are functions. A detailed explanation of the big O and small o notation 
and some examples are given in Appendix, Section 4.1. 


1. The Distribution of an Individual Payment 149 


To simplify formulations, we restrict ourselves to a positive r.v. &. Let F(x) = P(§ > x), 
and let my = E{&*}, the kth moment of €. 
First, note that if mg < œ for some k, then 


F(x) =O"). 


This immediately follows from the Markov inequality (0.2.5.2). 

Secondly, if F is light-tailed, then mg < œ for all k. Accordingly, if mg = œ for some k, 
the distribution is heavy-tailed. 

A direct way to show this is to set u(x) = x* in the general representation (0.2.2.1). We 
have 


=n eS J [1 —F(x)Jd(x*) =k f FOF dx (1.1.22) 
0 0 
If F is light-tailed, then by definition (1.1.8) and (1.1.22), for some xep, 


XcB __ ey XcB oo. 
mk = kf Flay ark f F(x) ldx < kf olde iB | eo xk Ny, 
0 XcB 0 XcB 


The first integral is finite, and the second is finite for any c > 0. 

However, the finiteness of all moments is not sufficient for the distribution to be light- 
tailed. It may happen that mą < œ for all k, but the distribution is heavy-tailed. 

As an example, one may consider the Weibull distribution (1.1.12), say, with r = 1/2 and 
a= 1. Then F(x) = exp{—,/x}. In Example 1.1.3-3, we saw that this distribution is heavy- 
tailed. However, it has all moments. Indeed, by (1.1.22), mg =k f e y * x! dx < 0, 


>To prove that the last integral is finite, we recall that e* = O(x™™) for any fixed m. In 
particular, e~ Y* = O((/x)~*-*) = O(x*!). Hence, e~V*x*"! = O(x-?). The function 
O(x-*) < Cx? for a constant C and large x’s. Such a function is integrable. < 


The following proposition gives more insight into the asymptotic behavior of tails. Let 
Mg (z) be the moment generating function (m.g.f.) of §. 


Proposition 1 The distribution of §& is light-tailed if and only if Mg(z) < œ for some 
z>0. 


> Proof. Let M(z) < œ for some z > 0. We apply the inequality for deviations from 
Proposition 0.4. Set u(x) = etl in (0.2.5.1). Since & is positive, by (0.2.5.1), for x > 0, 


PE >x) = P(E] >x) < ne =e Efe*} =e OM; (2). (1.1.23) 


Because Mz(z) < œ, condition (1.1.8) holds with c = z and B = Mẹ (z). 
Conversely, let (1.1.8) be true for some c,B > 0, and x > xeg. We have 


Mz (z) = [ are) = - f e*aF@) =— PFa) +2 f Fid. 


CX 


Now, lim,—50.e“F (x) < Blim,- €e% = 0 for z < c. Hence, 


Mg (z) = F(0) +z f e“F (x)dx < F (0) +z f ° ede +B | e“e “dx. 
9 0 XcB 


150 2. AN INDIVIDUAL RISK MODEL 


The first integral is finite, and the second is finite for z < c. W < 
An interesting discussion on possible tails of the loss distributions, especially heavy tails, 
may be found, e.g., in [37] and [93]. 


1.2 The distribution of the loss 


Next, we consider the r.v. X from (1.1.1). We assume & (the loss in the case where the 
loss event has occurred) to be positive. Then X = 0 only if J = 0, and 


P(X =0) =P(I=0) =1-g. (1.2.1) 


Furthermore, for x > 0, the payment X > x if, first, the loss event has occurred (the proba- 
bility is q), and secondly, the loss € > x (the probability is Fg (x) = P(§ > x), the tail of the 
distribution of €). Hence, P(X > x) = gF (x), and the d.f. of X is 


Fy (x) = P(X < x) =1—qF¢(2). (1.2.2) 
This also may be rewritten as 
Fy(x) =1—-q+qFe(x), (1.2.3) 


where Fẹ (x) = 1 —F¢(x) is the d.f. of &. 
In particular, since & was assumed to be positive, F (0) = 1 and 


Fx(0) =1—q. (1.2.4) 


Because X is non-negative, Fy (x) = 0 for all x < 0. 

Now, it is worthwhile to recall that if for a r.v. Z and a number c, it is true that P(Z = c) = 
0, then the d.f. Fz(x) is continuous at the point c. If P(Z = c) =A > 0, then Fz(x) “jumps” 
at the point c, and A is the size of the jump. For detail, see Section 0.1.3.3; in particular, 
Fig.4 there. 

We see that Fy (x) jumps at the point 0 by 1 — q. Hence, P(X = 0) = 1 — q, which was 
already stated in (1.2.1). 


EXAMPLE 1. Let € be exponential and E{€} = 1/a. Then 
by (1.2.2), 


-2 [dager = 0, 
Fett) = 0 ifx<0, 


The graph is shown in Fig.1. It is quite typical. 


Now consider moments of the r.v. X. From (1.1.1) it fol- 
lows that 


FIGURE 1. 


E{X*} = qE{&'}. (1.2.5) 
Set u = E{§} and v? = Var{&}. Then, in view of (1.2.5), 
E{X} = qu, (1.2.6) 
Var{X} = E{X°} — (E{X}} =qE{Ẹ} -g (ELE = av" +u) — ar 
= qv +q(l—a)er. (1.2.7) 


1. The Distribution of an Individual Payment 151 
EXAMPLE 2. In the situation of Example 1, we get 


E{X}= n Var{X} = s i 


igy _ 2q- 
al 5 9) _ L L (1.2.8) 


(Look up the mean and the variance of the exponential distribution in (1.1.6).) 


1.3 The distribution of the payment and types of insurance 


Let Y be the amount to be paid by the company in accordance with the insurance contract. 
For brevity, we will call it a payment. If the coverage is full, then Y = X. However, the 
insurance often pays only a part of the loss. As in Section 1.5, we set 


Y =r(X) 


and call r(x) a payment function. We assume that r(x) is non-negative, non-decreasing, and 
0 < r(x) < x. From the last condition it follows, in particular, that r(0) = 0. 
Consider several particular but important cases. In the first case, 


0 ifx<d, 
r(x) = ria(x) = bn a (1.3.1) 


where the payment policy involves a deductible d (the excess-of-loss type; see Section 1.5 
for detail). 
Next, consider the payment function 


x ifx<s, 


s ifx>s, Ce) 


r(x) = r(x) = { 
where s is a maximal or limit payment. 
The combination of these types, when both restrictions—a 
deductible and a limit payment—are included, is given by 


0 ifx<d, 
r(x) =raas(x) = $ x—d ifd<x<s+d, (1.3.3) 
sS ifx>s+d 


d d+s (see the graph in Fig.2). If in the last formula d = 0, then we 
FIGURE 2. The pay- come to (1.3.2) and if s = œ, we have (1.3.1). 


ment function in the case Another type of insurance with deductible is the franchise 
of deductible and limit deductible insurance with the payment function 


payment. 
0 ifx<d, 


x ifx>d. fled) 


rox) = raat) = { 


That is, if the loss exceeds the deductible, the loss is covered in full. 
One more type is proportional or quota share insurance where 


r(x) = rsk(x) = kx (1.3.5) 


152 2. AN INDIVIDUAL RISK MODEL 


for a positive k < 1. 
Certainly, there exist more complicated policies. For instance, in auto-insurance, limits 
for payments for different types of losses (car damage, medical expenses) are different. 
Our goal is to write the d.f. of Y, its expectation, variance and other moments. How to 
compute moments is clear since, in view of (1.2.2) and the fact that r(0) = 0, 


EW} =E= f AdE) =(0)PX=0) +4 | HdR =f Awd). 
(1.3.6) 
Sometimes, it is convenient to compute the last integral by parts or use the general rep- 
resentation (0.2.2.1) with u(x) = r*(x), which leads to 


EY} =4 (1— F(x) )dr*( y= f r (x)r (x) (1 — F (x))dx 
=q f r! (x)r! (x) Fe (x) )dx. (1.3.7) 
(See for details Section 0.2.2. Note also that, as in Section 0.2.2, we used the fact that 
r(0) =0.) 


Let r(x) = rsas(x) defined in (1.3.3). Then r(x) = 1 if x € (d,d +s), and =0 for x < d 
and x >d+s. At x = d and x = d + s we view r’(x) as a function having jumps; this does 
not matter for integration. Consequently, by (1.3.7), 


d+s 
E{y*} = ak f i (x— d)! Fe (x)dx. (1.3.8) 


Fork=1, 
d+s _ 
E{Y} =q Fg(x)dx. (1.3.9) 
d 


Note also that if we are interested in the moments of the payment r.v. given that the loss 
event has occurred, we should just set q = 1 in (1.3.7)-(1.3.9). Indeed, in this case, the 


payment is equal to r(&) and E{r*(€)} = i rk (x)dFz (x). Integrating by parts as we did in 
0 
(1.3.7), we come to the right members in (1.3.7)-(1.3.9) with q = 1. 


EXAMPLE 1. Let & be exponentially distributed with parameter a and consequently, 
E{&} = 1/a. Let r(x) = rsas(x) as defined in (1.3.3). Then, by (1.3.8), 


d+s 
E{y*} = ak f Gide dx: 
d 
The last integral is standard. In particular, 


d+s q 
E{Y} = af e™®dx = =e “(1—e*), 
d a 


and f 
+s 2 
E{y*} = z f (x— d)je “dx = or =" (1 +as)e~*), 
d 


1. The Distribution of an Individual Payment 153 


(The reader is invited to compute the last integral by parts or look it up in a Calculus 
textbook, e.g., [136]). Hence, 


24 ad = q —2ad —as\2 
Vary } = 56“ (1 — (1 +as)e ee (ae) 
a a 


q —a —as —a —as 
=e “[2(1- (1 +as)e )-ge “(1—e ae 


When there is no limit payment (s = œœ), 
E{y} =e, var{y} = Le |2 -qe l (1.3.10) 
a a 
If the coverage is full (d = 0), we come back to (1.2.8). 


EXAMPLE 2 ([153, N35]?). An insurance company offers two types of policies, Type Q 
and Type R. Type Q has no deductible but a policy limit of 3,000. Type R has no limit but 
an ordinary deductible of d. Losses follow the Pareto distribution (1.1.18) with © = 2,000 
and & = 3. Calculate the deductible d such that both policies have the same expected cost 
per loss. 

The word “ordinary” distinguishes this policy from the franchise deductible policy. As 
we saw in Section 1.1.3, the parameter 6 in (1.1.18) is a scale parameter. So, if we choose 
2,000 as a unit of money, then in accordance with (1.1.18), we have Fe = 1/(1 + x)®. 
Since we are considering losses, we set g = 1 in (1.3.9). The reader may see below that if 
we do not do that, g will cancel anyway. Thus, for Type Q, we have d = 0, the policy limit 


1.5 
s = 1.5, and the expected cost is f (1 +x) dx = 0.42. If s = œ, the expected cost equals 
0 


Gij 1 
f (1 +x) dx = +d)" Thus, 21 +4} = 0.42, which gives d ~ 0.091. Thus, the 
d 
answer is 0.091 x 2000 = 182. 


EXAMPLE 3 ([153, N4]°). Well-Traveled Insurance company sells a travel insurance 
policy that reimburses travelers for any expenses incurred for a planned vacation that is 
canceled because of airline bankruptcies. Individual claims follow the Pareto distribution 
(1.1.18) with © = 500 and & = 2. Because of financial difficulties in the airline industry, 
Well-Traveled imposes a limit of $1000 on each claim. If a policyholder’s planned vacation 
is canceled due to airline bankruptcies and he or she has incurred more than $1000 in 
expenses, what is the expected non-reimbursed amount of the claim? 

This problem is similar to Example 2. First, we choose $500 as a unit of money. The 
main point in our problem is that we consider the distribution of the loss § given that €& > 2 
units. The tail of the conditional distribution is 

(>x) _ (14a)? 


P 9 
NG PAG 2) = Ea) = (+2) = EE forx>2. 


Note that P(E > x|& > 2) = 1 if x < 2. Then, using the general formula (0.2.2.2), we have 
9dx 


BEIE>2)= [ P>rt>2dr= [ace [ORS = 


Reprinted with permission of the Casualty Actuarial Society. 


3Reprinted with permission of the Casualty Actuarial Society. 


154 2. AN INDIVIDUAL RISK MODEL 


In the case € > 2, the company pays 2 units, so on the average, the expected non- 
reimbursed amount equals 3 units or $1,500. 


Next, we consider the d.f. Fy (x). At the end of this section, we provide a general formula, 
but it is worth noting that in particular cases it is often easier to proceed from particular 
features of these cases rather than from a general representation. We consider two such 
cases. 

Let r(x) = r3sa(x) from (1.3.3). First, the payment is greater than zero if the loss event 
occurs (with probability q) and the loss is greater than the deductible (the probability is 
F;(d)). This leads to 

P(Y =0) =P(K < d) =1—qF;(d). (1.3.11) 


For 0 < y < s, the payment is greater than y if the loss event occurs and the loss § > d +y. 
The probability of this is qFg (y +d). Furthermore, the payment is equal to s if the loss 
event occurs and the loss is greater than or equal to d +s. So, 

P(Y =s) =P(X >d+s)=qP(§>d+s)=q(1—-P(§<d+s)). (1.3.12) 


Also, P(Y < s) = 1, and P(Y < 0) = 0. 
Thus, eventually, for the payment function (1.3.3), the d.f. of the payment Y is 


0 if y <0, 
Fy(y) = 4 1-qFg(yt+d) if0<y<s, (1.3.13) 
1 if y >s. 


Certainly, if s = œ (no limit payment), then we have 


0 if y<0, 


Fy(y) = CE po (1.3.14) 


EXAMPLE 4. Again let & be exponentially distributed with a parameter a. Then F. e(x) = 
e “. After substitution into (1.3.13), we have 


ify <0, 
Fy(y) = 4 L-ge #09 if0<y<s, (1.3.15) 
1 if y>s. 


FIGURE 3. 


1. The Distribution of an Individual Payment 155 
If s = œ (no limit payment), then 


JO ify <0, 
Fy) = { | —ge-a0+d) ify >0. (1.3.16) 


The graph for (1.3.15) is given in Fig.3 and has two jumps. The first is the jump at O with 
a size of 1 — ge~™, 

The second jump is at the point s and has the size ge“ ), (Certainly, if s = œ, we have 
just one jump at zero.) To understand the significance of this instance, we should again 
recall that the jump of a d.f. at a point c equals the probability that the corresponding r.v. is 
exactly equal to c. Thus, 


s+d 


(Compare with Example 1.2-1. Now P(Y = 0) is larger due to deductible.) 


Coming back to (1.3.13), we see that, as in the last example, the d.f. has at least two 
jumps: at O and at s. These jumps are equal to the respective probabilities that Y will 
assume the values 0 and s; see (1.3.11)-(1.3.12). 

Next, we briefly consider the franchise deductible type policy. In this case, since r(x) = 0 
for 0 < x < d, and r(x) =x for x > d, from (1.3.6) it follows that 


Eq | dR). (1.3.17) 


Furthermore, for 0 < y < d, we have P(Y < y) = P(Y = 0) = 1 — qP(Ẹ > d). If y > d, then 
P(Y < y) = 1 —qP(Ẹ > y). Thus, 


0 ify <0, 
Fy(y)=4 1-qFe(d) if0<y<d, (1.3.18) 
1—qF¢(y) if y>d. 


In Exercise 14, we will consider particular examples. 


i Route 1 = page 158 


Now, let us write a general formula for the d.f. Fy(y). 
Assume that r(x) is non-decreasing and continuous at any x, 
perhaps, except one point d such that r(x) = 0 for x < d, and 
r(x) > 0 for x > d. See, for instance, Fig.4. For a payment 
function this restriction is fairly mild. As an example, one 

FIGURE 4. may consider the function r4a(x) in (1.3.4) for the franchise 
deductible insurance type. 
Let y > 0. Obviously, the d.f. 


0 a ro r'o) 


Fy(y) = P(Y < y) =P(r(X) < y). 


156 2. AN INDIVIDUAL RISK MODEL 


We define the inverse r~'(x) in the same manner as we defined the inverse of a d.f. in 
Section 0.3.2.1. For the type of payment functions r(x) we consider, we may do it more 
explicitly setting r~!(y) = max{x : r(x) < y)}. The definition is illustrated in Fig.4. Note 
that the maximum above may be equal to infinity, which as we will see, cannot cause any 
problem. For example, if r(x) = r3sa (x) defined in (1.3.3), then r~!(s) = 09, as one can see 
from Fig.2. 

For the inverse so defined, the event {r(X) < y} = {X < r7!(y)} for any y > 0, which 
may be easily seen, for example, from Fig.4. Thus, by virtue of (1.2.2), 


Fy (y) = P(X < r ')) = Fe(r"'(9)) = 1 -4F e(r-"(0)). (1.3.19) 


Consider, as an example, r(x) = r3sa(x) from (1.3.3). In this case, r(x) is increasing on 
(d, d +s) and takes values from the interval (0, s); see again Fig.2. Then r~! (y) = y+d for 
y € [0,s) and for such y’s, 

Fy(y)=1—-q+qFe(y +d). 


Since, r~!(s) = œ, 
Fy(s) =1—qFe(~) =1—-q-0=1, 


which is natural because s is a limit payment. All of this leads again to (1.3.13). 


Now, let us revisit the ordinary deductible case. 

Obviously, the inclusion of a deductible into a policy decreases the payment. The ratio of 
the expected value of this decrease to the expected payment without deductible is called the 
loss elimination ratio (l.e.r.). Note that in the ordinary deductible case (1.3.1), the payment 
may be also written as Y = max{0,X — d}, and the decrease mentioned—as min{X,d}. 
Thus, the l.e.r. equals 


E{X}—E{max{0,X—d}} | E{min{X,d}} 
E{X} E{X} 


If X =0, which occurs with probability 1—g, then min{X ,d} =0. If X =€, which occurs 
with probability q, then min{X,d} = min{€,d}. Hence, E{min{X,d}} = qE{min{E,d}}. 
Since E{X} = qE{&}, 

_ E{min{E,d}} 


ler, = Efe} (1.3.20) 


and does not depend on q. 


To compute E{min{&,d}}, we again use (0.2.2.1) setting u(x) = min{x,d} and writing 


E{min{é,d}} = f (1 — R (x))d(min{x,d}). (1.3.21) 
0 
For u(x) = min{x,d}, the derivative u'(x) = 1 if x < d, and = 0 if x > d. Formally, the 


derivative u'(x) does not exist at d, but this does not matter for integration: we may view 
u'(x) as a function having a jump at d. Thus, from (1.3.21) it follows that 


d 
E{min{é,d}} = J (1 — R(x))dx. (1.3.22) 


1. The Distribution of an Individual Payment 157 


EXAMPLE 5. Let € be exponential with parameter a. We have 


d 
E{min{é,d}} = if e “dx= “(1 — ei), 


Since E{E} = 1/a, from (1.3.20) we get that the 1.e.r. is 1 — e7“. For d = 0, we naturally 
come to zero loss elimination. 


EXAMPLE 6. Let & have the Pareto distribution defined in (1.1.17) with parameter 
a > 1. A company covers the risk without a limit payment but with a deductible d. What 
should d be for the company’s mean payment to constitute 90% of the mean payment 
without deductible? 

This means that the l.e.r. equals 0.1. Since Ẹ > 1, we should distinguish two cases: d < 1 
andd > 1. 

Ifd < 1, then min{E,d} =d because € > 1. Therefore, in this case, E{min{6,d}} = d. 

Ifd > 1, using (1.3.22) and (1.1.17), we get 


l d 1 1 1 1 
i = — a = { = 
E{min{é,d}} = f (1 ojdx+ f (1/x*)dx H0 z=) wile = 
Since E{&} = a/(a—1) [see (1.1.19)], we obtain 
ESE Suet 
Ler = a 1 (1.3.23) 


We should determine values of d such that the last expression is equal to 0.1. The solution 
depends on &. For d = 1, the Le.r. equals (œ — 1)/o@ and this quantity equals 0.1 for & = 


-1 
10/9. Hence, if œ > 10/9, we should consider the case d < | and write L d=0.1. 
This will lead to d = ite [It becomes clear if we graph the function (1.3.23).] If 


1 1/(&—1) 
a < 10/9, we write I- gaT = 0.1 and get d = (£) ; 


In conclusion, we will consider the effect of inflation. Assume that there is a gap between 
the moments of loss and payments, and the insurer is obligated to cover inflation losses. 
From a modeling point of view, this means that at the moment of payment, instead of 
losses &, the insurer should proceed from the amount (1 +v), where v is the inflation 
rate during the period under consideration. The point is that the deductible is subtracted 
after inflation has been taken into account. In the ordinary deductible case (1.3.1), we may 
consider the problem as follows. 

First, note that (1.3.1) may be rewritten as r(x) = max{0,x — d}. In view of (1.1.2), 
without inflation, we would have Y = /-max{0,& —d}. (If there are no losses (J = 0), then 
Y =0, and in the case of a loss €, the payment Y = max{0,€ — d }.) 


158 2. AN INDIVIDUAL RISK MODEL 


In the case of inflation, we should replace € by (1+ v)&, and hence the payment 


Y = 1max{0, (1 +y) =d} = J-max { 0, (1 +v) (:- a} 


d 
= (1+ y)f-max{0,- 72}. (1.3.24) 


It appears as if you apply a smaller deductible, d/(1 +v), and after that you multiply the 
payment by the inflation coefficient (1 + v). 


To illustrate this, consider the simple case when € is exponential with parameter a. In 
(1.3.10), we got that the expected payment E{Y} = 4exp{—ad}. To consider inflation, we 
should replace € by (1+v)&. The latter r.v. is also exponential with the parameter a/(1 +v). 
Thus, replacing a by a/(1 +v), we have 


E{Y} = uty) exp{—ad/(1+v)}. (1.3.25) 


However, we can arrive at (1.3.25) in another way. Namely, we could just replace in 
(1.3.10) the deductible d by d/(1 +v), which would lead to 4exp{—ad/(1+v)}. After 
that, we multiply the result by (1 +v), which again leads to (1.3.25). Certainly, in this 
simple case, it does not matter which way we choose. In more complex situations, the latter 
technique allows us to avoid calculations when inflation is considered. The discussion is 
continued in Exercise 15. 


2 THE AGGREGATE PAYMENT 


In this section, we do not specify particular details of insurance contracts such as de- 
ductible or payment limits. So here, we use habitual notation and denote by X’s the pay- 
ments provided by a company to particular clients. 


Consider a group consisting of a fixed number n of clients. Let X; be the payment to the 
ith client. Then the cumulative payment 


S = Sn =X, +... + Xp. 


In this context, the r.v.’s X; are called sometimes severities. The object of our study is the 
distribution function and other characteristics of S. Unless stated otherwise, we assume 
the X;’s to be independent. If the X’s are also identically distributed, we call the group 
homogeneous. 


There are several approaches to this problem. 


2. The Aggregate Payment 159 


2.1 Convolutions 
2.1.1 Definition and examples 


Let F;(x) be the d.f. of X;. Consider first the case when n = 2, so S = X; + X2. The basic 
fact is that if X; and X, are independent, then the d.f. of S is 


=f, F,(x—y)dFy(y). (2.1.1) 


A proof of (2.1.1) may be found in many books on Probabil- 
ity, e.g., in [102], [116], [120]. The idea is to consider the 
vector (X1,X2) whose values are points (x;,x2) in the plane. 
Then P(X; +X < x) is the probability that the value of the 
vector (X1,X2) will be in the region {(x1,x2) : x1 +x2 <x}; 
see Fig.5. Since X1, X2 are independent, this probability may 
be written as the double integral 


FIGURE 5. ee. 7 dF, (xı )dFy(x2). (2.1.2) 
x1 +x <x 


Direct integration will lead to (2.1.1) if we replace x2 by y. The reader can provide details 
of the integration on her/his own or look at proofs, e.g., in [102], [116], [120]. 


The operation (2.1.1) is called convolution and is denoted in symbols as Fi * F2. 


Once we have “convoluted” F; and F2, we can continue, adding to S2 a third r.v. X3. In 
accordance with (2.1.1), in this case, the d.f. Fs, must be convoluted with F3. This leads to 
Fs, = Fs, * F3 = F; * Fy * F3. Continuing this process, we get 


Fs, =F, *...* Fh. 


Examples will be considered a bit later. 

Assume that the X’s are continuous r.v.’s and for each i, there exists the probability den- 
sity f;(x) = F/(x). Then, differentiating (2.1.1), for n = 2, we get that the density of S 
is 

fsx) = ah a Fi(x—y)dFy(y zS Fi (x—y) faly)dy 
x dx 


d 
E i g y)dy = Fr (x—y) faly)dy 


Thus, 
= [file—y)A0)ay. 2.13) 


This operation is denoted by fı x f2 and is called the convolution of densities. In general, 
for an arbitrary integer n 


Ss, = fi * * Sn 


160 2. AN INDIVIDUAL RISK MODEL 


The counterpart of (2.1. : for discrete integer-valued r.v.’s is as follows. Let X1, X2 take 
on values 0,1,2,.. ., and fl” P(X; =k), i= 1,2. Then, setting fn = P(S =m), for n =2, 
we have 


=E S y (2.1.4) 
k=0 


Formula (2.1.4) may be derived from (2.1.1) but the direct proof that we provide now is 
shorter and more illustrative. The r.v. S is equal to m if the r.v. Xı is equal to some k and 
X tom— k n corresponding probability is P(X, = k, X2 = m — k) = P(X; =k)P(X2 = 

—k)= ff ) pO p because X; and X2 are independent. Summing over k leads to (2.1.4). 
onside: the sequences 


f) = fF), 6, a); fO = AR f = (Fis fz). 


The above sequences of probabilities specify the distributions of X1, X2, and S, respectively. 
Then (2.1.4) may be written in compact form as 


f= f) $ f 
EXAMPLE 1. Let independent r.v.’s 
E i 1 with probability pı =} "R i 1 with probability p> = } 
0 with probability qı = 4 ’ 0 with probability q2 = 4 
Clearly, S takes on values 0,1,2. The problem is very simple and can be solved directly, 


but we use this example to demonstrate how (2.1.4) works. We have 


_ Al) -(2) 21 1 
fo=fo fo Wa a y 


oe ae 
mage = ae 


2 
3 


NI] Re 


fi= LAs n— GS a En Ta = qıP2 + pıq2 = 


11 1 
h= E OO ap) Oe fi Ee a )=q1:0+p1p2+0-q2 SME 5: 
k=0 


EXAMPLE 2. This is a classical example. Let X; and X be independent and uniformly 
distributed on [0,1]. Obviously, S2 = X; + X2 takes on values from [0,2], so it suffices to 
find the density fs(x) only in this interval. The densities f;(x) = 1 for x € [0,1] and = O 
otherwise. Hence, by (2.1.3), 


1 
x) =| fi(x—y)dy. (2.1.5) 


The integrand fı (x— y) = 1 if 0 < x— y < 1 which is equivalent to x— 1 < y < x. 

Let 0 < x< 1. Then the left inequality holds automatically because x — 1 < 0 while y > 0. 
So, fi(x—y) = 1 if y <x, and = 0 otherwise. Hence, for 0 < x < 1, we may integrate in 
(2.1.5) only over [0,x], which implies that 


fs, (x af dy =x. 


2. The Aggregate Payment 161 


FIGURE 6. For Example 2: (a) the graph of fs,; (b) the graph of fs,. 


On the other hand, in view of the symmetry of the distributions of the X’s, the density fs(x) 
should be symmetric with respect to the center of [0,2]; that is, the point one (see Fig.6a). 
So, for 1 < x < 2, we should have fs(x) = 2 — x. Eventually, 


x ifO0<x<1, 
fs, (x) = 4 2-x ifl<x<2, (2.1.6) 


0 otherwise; 


see again Fig.6a. This distribution is called triangular. We see that while the values of X’s 
are equally likely, the values of the sum are not. In Exercise 19, the reader is invited to give 
a common-sense explanation of this fact. 

Now let X3 be also uniformly distributed on [0,1] and independent of X; and X2. Obvi- 
ously, the sum S3 = X; +X2 + X3 assumes values from [0,3]. To find its density, we can 


again apply (2.1.3), replacing f2(y) by fs(y), and fı (x— y) by fs, (x— y). Thus, 


1 
fs;(x) = | fs, (x — y)dy, (2.1.7) 


where fs, is given in (2.1.6). We relegate a bit tedious calculations to Exercise 19. The 
result is 


x2 if0<x<1, 

_ J (-2x? +6x—3)/2 if1<x<2, 
Jsa) = 4 (32/9 if2<x <3, 
0 otherwise. 


The graph is given in Fig.6b. 


EXAMPLE 3. Let X; and X; be independent, X; be exponential with E{X,} = 1, and X2 
be uniformly distributed on [0,1]. Then the density of the sum is given by 


fsx) = fa (x—y)dy. 


162 2. AN INDIVIDUAL RISK MODEL 


The integrand fı (x —y) = e70) if y <x, and = 0 otherwise. Hence, for x < 1, we should 
integrate only up to x, which implies that 


fs(x) =f edy =1—e™ for x<1. 
0 


03 For x > 1, we should consider the total fà, and 


03 fs(x) = i: edy =e *(e—1). 


o The graph for all x’s is given in Fig.7. 
In the examples above, we saw that the distribution of a 
sum may essentially differ from the distributions of the sep- 
FIGURE 7. arate terms. Next, we consider cases when the convolution 
inherits properties of individual terms in the sum. 


0 
0 1 2 3 4 5 6 


2.1.2 Some classical examples 


I. Sums of normals (normal r.v.’s). 


Proposition 2 Let X, and X2 be independent normals with expectations mı and m, and 
variances o? and o, respectively. Then the r.v. S = Xı + X2 is normal with expectation 
mı +m, and variance 0? +02. In other words, if Qng2 is the normal density with mean 
m and variance 6”, then 


®inio? * Pmoż = inj +mr,0?+03° 


We consider a short proof in the next section. Here, as one more example of convolution, 
we demonstrate the direct convolution for the particular case mı = m = 0 and 6, = 607 =1. 
Denoting by (x) the standard normal density and applying (2.1.3), we have 


x)= f ga-od = [ny "Pexp{—(x—y)?/2} 2m)" exp{—y*/2}dy 


= ny fe w - 5 ay = (20)! f” exp{ay—y? — 57 }ay. 


Completing the square, we write xy — y? — 5x? = — (y — x/2)? — 4x”. Then 
fsx) = Om) exp{—x7/4} | exp{—(y—x/2)?}dy. 
As is straightforward to verify, with the change of variables y — x/2 = s/ V2, we may write 


fs) = Te zl- _x2/4} = se ex {—s?/2 hay 


The last integral is that of the standard normal density and consequently, equals one. The 
expression before the integral is the normal density with zero mean and a variance of two. W 


2. The Aggregate Payment 163 
II. Sums of Poisson r.v.’s. 


Proposition 3 Let X, and X be independent Poisson r.v.’s with parameters i, and io, 
respectively. Then the rv. S = Xı +X2 is a Poisson r.v. with parameter ài + Ap. In other 
words, if Ti, is the Poisson distribution with mean i, then 


Thi *T), = Thi +2: 


Proof. Again, to demonstrate the convolution procedure, we give a direct proof. A 
shorter proof with use of m.g.f.’s is given in the next section. By virtue of (2.1.4), for 
fm = P(S =m), we have 


m—k 
—À At eo’ A> 


=E AS =f: klf (m—k)! 


1 <2 1 
= Bee kym—k _ oe Ait) ky mak 
=e N p nes spiking oe J (a 


' 
m- (<0 


The last sum above is the binomial expansion of (A; + Az)’, which leads to the Poisson 
formula for the probability fm. E 


IHI. Sums of I -distributed r.v.’s. We call a r.v. I -r.v. if it has a I -distribution. 


Proposition 4 Let X; and X2 be independent Y-r.v.’s with parameters (a,v,) and (a,V2) 
respectively. (Notice that the scale parameter a is the same.) Then the r.v. S = X; + X2 
is a T-r.v. with parameters (a,V, + V2). In other words, if fay denotes the Y-density with 
parameters (a,v), then 

Sav, * fay, = fay, tyr: (2.1.8) 


This is a very important fact having many applications in diverse areas such as physics, 
economics, etc., and, of course, in insurance, which we will see repeatedly. 

In Exercise 21, the reader is encouraged to carry out a direct proof using (2.1.3). Exercise 
21 contains some suggestions as to how to proceed. The reader also can take a look, e.g., 
at [122, p.192]. In the next section, we prove Proposition 4 by using m.g.f’s. 


EXAMPLE 1. During a day, a company received four telephone calls with claims from 
clients from a homogenous group. The company knows that the distribution of a particular 
claim given that a loss event happened, is exponential with a mean of one (unit of money). 
However, the real sizes of these particular claims have not yet been evaluated. What is the 
probability that the cumulative claim will exceed, for example, 5? 

We deal with S4 = X; + X2 + X3 +X4, where the X’s are exponentially distributed with 
the same parameter a = 1. The exponential distribution is the I -distribution with parameter 
v = 1. Hence, by Proposition 4, the r.v. S4 has the I -distribution with parameters a = 1 and 
v =4. So, the density 


fsx) = : e” = e" e, 


164 2. AN INDIVIDUAL RISK MODEL 


and 


1 > 
P(S4 >5)=1- =| xe *dx = 0.27. 
0 


EXAMPLE 2 ([153, N16]*) includes situations different from what we considered above. 
Which of the following are true? 

1. The sum of two independent negative binomial r.v.’s with parameters (p;,V,) and 
(p2,V2) is negative binomial if and only if vı = v2. 

2. The sum of two independent binomial r.v.’s with parameters (1,11) and (p2,n2) is 
binomial if and only if nı = m. 

3. The sum of two independent Poisson r.v.’s with parameters à; and Az is Poisson if and 
only if Ay = Ad. 

All of the above are false. 

1. First, note that the distribution (0.3.1.15) differs from (0.3.1.13) only by a shift, so the 
answer should be the same for both distributions. Let pı = p2 = p, and v1,V2 be positive 
integers. Let us consider a sequence of independent trials each having probability p of 
being a success. Denote by My, the moment when the v,-th success occurs, and by Ny, the 
number of trials between the v,;-th success and the success with the number vı + V2. Then 
Ny, and Ny, are independent negative binomial r.v.’s having distribution (0.3.1.13) with the 
common parameter p and the respective parameters vı and v2. Then the r.v. Ny, + Wy, is 
the moment when the success with the number vı + V2 occurs. Hence, Ny, +My, have the 
negative binomial distribution with parameters p,v; + V2. The condition vı = v2 is not 
necessary, while the condition p; = p2 matters. 

2. Let Sı and S2 be independent binomial r.v.’s with parameters (p1,n1) and (p2,n2), 
respectively. If pı = p2 = p, as we did above, we may interpret Sı as the number of 
successes in the first nı trials, and S> as the number of successes in nz trials after the nth 
trial. Thus, Sı + S2 is the total number of successes in nı +m trials, and this r.v. has a 
binomial distribution. Again, the condition pı = p is essential, while the equality nı = m 
is not a necessary condition. 

3. From Proposition 4 it follows that A; = A is not necessary. 


In conclusion, note that Propositions 3 and 4 show—above all—that the Poisson and 
T-distributions may be well approximated by normal distributions for large parameters À 
and v. 

Indeed, consider, for example, independent identically distributed r.v.’s X1, Xo, ... having 
the exponential distribution with parameter a. Then, by Proposition 4, the sum S$, = X1 + 
... +X, has the [-distribution with parameters (a,n). On the other hand, by the Central 
Limit Theorem, S, is asymptotically normal for large n. Consequently, the [-distribution 
with parameters (a,n) is also asymptotically normal for large n. The same reasoning applies 
to the Poisson case. Exercises 43 and 44 contain detailed advice on how to show the same 
in the general case when A and v may not be integers. 


Route 1 = page 166 
IR 


eprinted with permission of the Casualty Actuarial Society. 


2. The Aggregate Payment 165 


2.1.3 An additional remark regarding convolutions: 
Stable distributions 


The following notions help to come to a deeper understanding of Propositions 2—4 above. 

We say that a class of distributions is closed with respect to convolution if for any two 
independent r.v.’s, X; and X2, having distributions from this class, the distribution of their 
sum X; + X2 also belongs to the same class. For example, as we saw, the class of all normal 
distributions is closed with respect to convolution, and the same is true for the class of all 
Poisson distributions, or the class of all T’-distributions with a fixed scale parameter a. 

However, these classes have different structures, and to clarify this, we introduce one 
more notion. 

Consider a r.v. X and the family of r.v.’s Y = a +bX for all possible values of numbers 
a and b > 0. The family of the corresponding distributions is called a type. We refer to 
b as a scale factor, and a as a location constant. One may say that any two distributions 
from a type are the distributions of r.v.’s that may be linearly transformed to each other. In 
particular, this means that not only the r.v. X but any r.v. Y = a+bX with b > 0 may serve 
as the “original” r.v. generating the type. 

For example, since any (m,o7)-normal r.v. Y = m+oX, where X is a standard normal 
r.v., normal distributions compose a type. The same is true for the family of all uniform 
distributions because the distribution uniform on [s,t] may be represented as the distribution 
of a r.v. Y = s + (t — s)X, where X is uniform on [0,1]. 

On the other hand, Poisson distributions do not compose a type. Indeed, if X is a Poisson 
r.v., then the r.v. a+bX is a r.v. assuming values a, a+b, a+2b, ... rather than 0,1,2,..., and 
hence, it is not Poisson. Therefore, the class of Poisson distributions is a class of another 
nature than a type. 

The same is true for the IT-distribution because two I -distributions with different values 
of the parameter v cannot be reduced to each other by a change of scale. See, for exam- 
ple, the graphs of I-densities for different v’s in Fig.0-13; these functions are essentially 
different and cannot be reduced to each other by a linear transformation of the argument. 

An interesting and important question is which types are closed with respect to convolu- 
tions. By Proposition 2, the normal type has such a property, while—as we saw in Section 
2.1.1—the uniform type is not closed with respect to convolution. 

It turns out that the closedness of a type with respect to convolution is a rare property and 
may be considered a characterization property of the normal distribution. 


Proposition 5 The normal type is the only type of distributions with finite variances, 
which is closed with respect to convolution. 


A proof of this proposition is actually not difficult but beyond the scope of this book; see, 
e.g., [27], [38], [122, p. 255]. 

Thus, when the variances are finite, the sum has the same type as separate terms only in 
the case of normal distributions. Regarding the classes of Poisson or [’-distributions that 
are closed with respect to convolution, we realize that these classes are not types, and the 
changes that convolution brings about are more essential than the change of scale. 

Another matter is that there are distributions with infinite variances whose types have the 
property under discussion. These distributions are called stable. An example is the Cauchy 


166 2. AN INDIVIDUAL RISK MODEL 


distribution with density f(x) = A The theory of stable distributions may be found in 


many textbooks; see, e.g., [27], [38], [120], [122], [129]. 


2.1.4 The analogue of the binomial formula for convolutions 


Sometimes it is useful to know that the Newton binomial formula applies to convolutions 
too. Namely, for any distributions F; and Fy, and non-negative numbers & and B such that 
a+B=1, 

n 
(oF, +B)" =) (o Vsa uE T A (2.1.1) 
k=0 
Here F** = F x ...x F , where the convolution is carried out k times. Detailed advice on how 
to prove (2.1.1) is given in Exercise 41. 


EXAMPLE 1. Let fa denote the [-density with parameters a,v. Find the density 
f =f (x) = (fav, (x) + Bfav, (x))*3.. The reader will see that if a is a common parame- 
ter, we can provide calculations for any o,B,v1,V2. However, to make the calculations 
more transparent, let us set & = B = 1/2,v1 = 1,v2 = 2. Then, making use of (2.1.1) and 
(2.1.8), we have 


ie i ALD aa 11 gael 2 
f= aha +37" aha *faat35° ght fist gfe: 


By (2.1.8), F = fas, ft? * faz = faz * far = faa and similarly, fui * f°? = fas, F23 = fas- 
Thus, 


f = : (faa +3 faa +3 fas + fao) : 


Hence, 
1 a 2 ax | 3a 3 „—ax 3a° 4 ,—ax | 5 jax 
fel 5 (rahe ght re etre 
NS ae ce ee ee ax 
= (G24 See So ee de 


The particular cases considered above are important and interesting but, nevertheless, are 
special. In general, computing convolutions is tedious and for large n, practically impossi- 
ble. The method we consider next sometimes allows to avoid complicated calculations. 


2.2 Moment generating functions 


In this section, we use m.g.f.’s. The reader is recommended to look up this notion in Sec- 
tion 0.4 and consider at least some problems from Exercises 33—40 of the current chapter. 

Let again S, = X; +... +X,, where X;’s are independent r.v.’s (for example, of payments). 
Let M;(z) be the m.g.f. of X;. Then, the m.g.f. of S, is 


Ms, (z) = Mı (z)-Ma(z)-...-Mn(z). 


(See property (0.4.1.4).) Multiplication is a much easier operation than convolution. If we 
can compute the m.g.f.’s of individual terms, we can compute the m.g.f. of the sum. If we 


2. The Aggregate Payment 167 


are able to determine which distribution the latter m.g.f. represents, due to the uniqueness 
property of m.g.f. (see Theorem 8 in Section 0.4.1), the problem will be solved. Sometimes, 
to do this, it is enough to recognize a familiar m.g.f. In other cases, some calculations are 
required. 

To demonstrate the power of the method of m.g.f.’s, we begin with the three classical 
examples corresponding to the three convolution cases considered in the previous section 
in Propositions 2-4. We will see that the m.g.f. method allows to essentially simplify 
proofs. For the m.g.f.’s of the distributions considered below, see Section 0.4.3. 


I. Sums of normals. 

Let X; and Xz be normal with expectations mı and m and variances o? and 05, respec- 
tively, and let S = X; + X>. Since the m.g.f. of a (m,o7)-normal r.v. is exp{mz + 6727/2}, 
the m.g.f. 


Ms(z) = exp{mz+ {27/2} exp{mgz + 052° /2} = exp{(m) +m)z+ (01 +03)z/2}. 


This is the m.g.f. of the normal distribution with expectation m  +mp, and variance 07 +03, 
which proves Proposition 2. 


II. Sums of Poisson r.v.’s. 
Now, let X; and X2 be Poisson r.v.’s with respective parameters A, and Az. The m.g.f. of 
a Poisson r.v. with parameter A is exp{A(e* — 1)}. Then the m.g.f. 


Ms(z) = exp{ (e — 1) exp{Aa(e® — 1) } = exp{ (M +A2)(e— 1). 


This is the m.g.f. of the Poisson distribution with parameter A, +A and Proposition 3 is 
proved. 


Ill. Sums of I -distributed r.v.’s. 
Let X; and X2 be I-r.v.’s with parameters (a,v1) and (a,v2), respectively. Since the m.g.f. 
of the T-r.v. with parameters (a,v) is (1 —z/a)~Y, the m.g.f. 


Ms(z) = (1 —z/a)~! (1 —z/a)™ =(1 —z/a)~™), 


This is the m.g.f. of the I-distribution with parameters (a,v; + V2), which proves Proposi- 
tion 4. 


Let us consider now a typical example demonstrating how the calculations may proceed 
in the general case. 


EXAMPLE 1. Let X; and X; be 1.1.d. r.v.’s with common distribution F = iF; + $F, 
where F; and F) are exponential distributions with means 1 and 2, respectively. So, we 
consider a mixture of exponentials. Find the distribution of S$ = X; + X2. 

The m.g.f. of an exponential r.v. with mean m is (1 —mz)~!. (Show this proceeding 
from results of Section 0.4.3.) The m.g.f. of a mixture of distributions is equal to the 
mixture of the m.g.f. (see Section 0.4.1). Thus, the common m.g.f. of X; and X2 is My (z) = 


1 1 2 1 
3° Tg 3° 1% 


i Galen cas Oi 
Hence, Ms(z) = ; 


168 2. AN INDIVIDUAL RISK MODEL 


Next, we apply the method of partial fractions writing 


E 201 dy 3a b oe c ,_4 
3 1=z 3 De 1l=z l-27 (1-2) ` (1-—2z)?’ 


(2.2.1) 


where a,b,c,d are coefficients. It is easy to compute (by finding the common denominator) 
that for the last equality to be true, one should have a = —4/9, b = 8/9, c = 1/9, d = 4/9. 
Mathematical software such as Maple or Mathematica can do it automatically. 

The right member of (2.2.1) may be considered as the mixture of four m.g.f.’s with 
weights a,b,c,d. The first two m.g.f.s are exponential with respective means 1 and 2, 
and the last two correspond to the [’-distributions with the vector-parameter (a,v) equal to 
(1,2) and (1/2,2), respectively. 

(Indeed, for example, TEF is the m.g.f. of the standard exponential distribution. Then 

| 1 
(l-z)? 1-z 1-z 
bution. By virtue of Proposition 4, this is the [’-distribution with parameters (1,2). The 
other terms in (2.2.1) are treated similarly.) 

The fact that one weight in (2.2.1) is negative should not make us uncomfortable: we 
consider this representation as a purely mathematical construction. 

If the m.g.f. of the sum is the above mixture of the mentioned m.g.f.’s, then the density of 
the sum is the mixture of the corresponding densities with the same weights. More specifi- 


lly, 4 8 1 1 er le te 
Cally. f(x) — ete ; e7*/2 xe | ( ) xe */2 


+(x —4)e~*] forx>0,and =Oforx<0. (2.2.2) 


is the m.g.f. of the convolution of two standard exponential distri- 


I 
SI = 
el 
“—~ 
eS 
+ 
& 
wa 
ios) 

= 

Sian, 

N 


One can double check that the last function is positive for 
x > 0 with total integral of one, so it is indeed a density. The 
graph is given in Fig.8. 


> EXAMPLE 2. It is useful to consider another way of 

solving the problem of Example 1. As usual, denote by fay 

z the I -density with parameters a,v, and by f the density of 
FIGURE 8. the sum. By virtue of (2.1.1), 


f= (5m5) = gfit + gf that ofp 


By (2.1.8), fi? = fiz, and fifa = fi/2,2- 
In order to find fi; * f1/2,2, it is better to apply the m.g.f.’s method. In Exercise 40, the 


reader is asked to prove that [fi1 * f/2,2|(x) = ae: 


we used in Example 1 above. Then 


, using a method similar to what 


2 
{eH eee s(x? eye (5) xe? 


and we again arrive at (2.2.2). O < 


2. Premiums and the Solvency of Insurance 169 


In general, even if we have found the m.g.f. of a sum of r.v.’s, finding the corresponding 
distribution may turn out to be difficult. As we will see later, the method of m.g.f.’s proves 
to be useful rather for qualitative theoretical analysis. 

However, there is one more principal difficulty which is not relevant to analytical meth- 
ods. In practice, we do not know the distributions of the addends X; precisely. As a rule, we 
can only estimate some parameters of these distributions, for example, means, variances, 
and some moments. Accordingly, we are guaranteed at best only approximations of the 
distributions of sums. The most important such approximation is the one associated with 
the normal distribution and based upon various modifications of the CLT. 


3 PREMIUMS AND SOLVENCY. 
APPROXIMATIONS FOR AGGREGATE CLAIM DISTRIBU- 
TIONS 


3.1 Premiums and normal approximation. A heuristic approach 
3.1.1 Normal approximation and security loading 


We already touched on the method of finding premiums with use of normal approxima- 
tion in Section 1.2.1. Now we will do it in much more detail. 

Let Sn = X,+...+X,, where X;’s are r.v.’s. For now, we do not assume they are indepen- 
dent or identically distributed. Consider the normalized sum 


S* = Sa EASaL (3.1.1) 


”  „/Var{Sn} 
The goal of normalization is to consider the sum S,, in an appropriate scale; namely, after 
normalization, E {S*} = 0 and Var{S*} = 1 (see also Section 0.2.6). 
The modern probability theory establishes a wide spectrum of conditions under which 
the distribution of S% is asymptotically normal; that is, conditions under which for any x, 


P(S} <x) > B(x) as n > œ, (3.1.2) 


where ®(x) is the standard normal d.f. 

In the simplest case where the separate terms (or addends) X; are independent and iden- 
tically distributed (i.i.d.), and have a finite variance, (3.1.2) is always true (see Section 
0.6.2). In more general situations, some conditions are needed, but it is worth emphasizing 
that these conditions are fairly mild. Let us discuss it in more detail. 

If the addends are independent but not identically distributed, the corresponding condi- 
tions require that there are no addends that in a certain sense have the same order as the 
whole sum. One sufficient condition of such a type will be established in Section 3.2. 

Independence of addends is also not necessary. The theory of normal approximation 
for dependent addends is now well developed and deals with a wide variety of types of 
dependency. In each case, the corresponding condition of asymptotic normality means that 
the random addends may be dependent but “not too strong”. 


170 2. AN INDIVIDUAL RISK MODEL 


Probably, the simplest example for illustrative purposes is the so called m-dependence 
when each term in the sum depends only on the “nearest” m r.v.’s. More specifically, X; 
depends only on r.v.’s X; for which i—m < j <i+m. The last collection of r.v.’s may be 
viewed as a “dependency neighborhood”. 

For example, if m = 1, the r.v. X; depends only on X2, the r.v. X2 depends only on X; and 
X3, the r.v. X3 depends only on Xz and X4 and so on. So, the dependency neighborhood of 
a r.v. consists of three r.v’s (counting the r.v. itself). 

As a matter of fact, one may impose a much weaker condition. For example, it is suffi- 
cient to require that the dependence between two terms, X; and X;, becomes (in a certain 
sense and at a certain rate) weaker as the “distance” |i— j| becomes larger. Loosely put, the 
addends which are far away from each other are weakly dependent. Requirements of such 
a type are called mixing conditions. 

Some central limit theorems for dependent r.v.’s may be found in [120], [122]; more 
systematic exposition—e.g., in [63], [70]. Graph related dependency structures described 
in terms of dependency neighborhoods may be found, e.g., in [113]; economic models 
based on such structures in [87], [88]; see also references therein. 


Once asymptotic normality is established, we can use it to estimate, for example, the 
premium for which the probability that the company will not suffer a loss is larger than a 
given level. 

Suppose that the company specifies the lowest acceptable level B for the mentioned prob- 
ability. For instance, B may be equal to 0.9 or 0.8. Often, insurance companies connect 
such a probability with the investment rate in the financial market and the corresponding 
probabilities of default. It may lead to a very high B like 0.99. As a matter of fact, B does 
not have to be very close to one. If, say, B = 0.8, this means that in the long run, the com- 
pany will make a profit 80% of time on the average. If the single period which we consider 
here is short, this is not bad at all. 

EXAMPLE 1. Consider k periods of time. Let the probability of avoiding a loss in 
a separate period be B, and let results for different periods be independent. Reasoning 
very roughly, assume that the company will not suffer a loss overall if the number of non- 
successful periods is less than the number of successful periods, which means that the 
number of non-successful periods is less than k/2. Denote the probability of this event by 
p = p(k,B). This is the binomial probability that during k independent trials, the number of 
successes will not be less than k/2. More precisely, p(k,B) = È j>x/2 (‘) B/(1 —B)*-/. By 
using Excel or a calculator in statistics mode, it is easy to verify data in the following table. 


TABLE 1. 


k=3 |k=3 k=3 |k=12 |k=12 | k=12 
B=0.8 | B=0.85 | B=0.9 | B=0.6 | B=0.7 | B=0.8 
p(k,B) | 0.897 | 0.939 | 0.972 | 0.841 | 0.961 | 0.996 


Let one period be a year, and let the total period be (only) three years (k = 3). Then the 
probability of not suffering a loss at the end of the three-year period is approximately 0.9 
for B = 0.8. For B = 0.9, it is close to 0.97. 

If one period is a month and the total period is a year (k = 12), then even B = 0.6 is not 


3. Premiums and the Solvency of Insurance 171 


bad since p in this case is about 0.84. For B = 0.8, we have p = 0.996, which is fairly high. 

Note, however, that if a single period under consideration is a month, then the probability 
q of a loss event for separate clients may be small. In the numerical examples shown below, 
the choice of g corresponds rather to a single period of a year. 


Let us return to the general scheme and consider one period of time. Assume that the 
premium the company collects is proportional to the expected payment. More specifically, 
the total premium 

Cn = (1 +0)E{ Sn}. (3.1.3) 


As was already mentioned in Section 1.5, the coefficient O is called a relative security 
loading. The quantity 9E{S,} is called a security loading. Similar to Section 1.2.1, for 
the least acceptable premium, we may write 


Sn —E{Sn} 8E{S, } 
= P(Sn < cn) = P(S, — E1 Sn} < OE{S,$) =P < 
B = P(Sp < cn) = P(Sa — E {Sa} SOE {Sn}) (SE < EL) 


= P (S; <0E{S,} / Var{S,}), (3.1.4) 


where the normalized sum S% is defined in (3.1.1). 


We start with non-rigorous estimation. If we consider normal approximation acceptable, 
we can write that P(S% < x) ~ ®(x) for any x, and in particular, 


P (s: < OE{S,} J VVar{S.¥ ) ~@® (8E{S,} J Varsa} ). Thus, 
© (GE {Sn} / VVar{S,} ) =B- 


This, in turn, implies that 


Ox Gh AS, VVar{Sn} (3.1.5) 


E{Sn} 


where qg, is the B-quantile of the standard normal distribution, which means that ®(qgs) = 
B. For instance, if B = 0.9, then gg, = 1.28... . 

Consider for a while, the particular case of independent and identically distributed (i.i.d) 
rvs X;. Let m = E{X;}, 0° = Var{X;}. Then, E{S, } =mn, Var{S,,} = 0°n, and as is easy 
to verify, (3.1.5) may be rewritten as 


ac agso 


myn 


For a distribution with mean m and standard deviation o, the fraction o/m is called a co- 
efficient of variation. We see that O is specified by the coefficient of variation of the ad- 
dends X; and the number of clients. Moreover, in view of (3.1.3), the total premium may 
be distributed between individual clients in accordance with the relative loading 8, which 
amounts to the premium (1+ 0)m for each client. 

Given the coefficient o/m, the larger the number of clients, the less the security loading 
needed by the company to maintain a certain level of security. More precisely, the loading 
coefficient required is proportional to 1/,/n. 


0 (3.1.6) 


172 2. AN INDIVIDUAL RISK MODEL 


Assume that a certain number n of clients agree to buy insurance with a loading coef- 
ficient larger than or equal to the loading coefficient in (3.1.6). Then the company can 
function at a level of security higher than or equal to the desirable level B. This reflects 
the essence of the matter; namely, that insurance is efficient when the risk is redistributed 
among a sufficiently large number of clients. We discuss this issue in more detail in Section 
3.3. 

Let us come back to the general case. Set m; = E{X;}. Then 


E{S,} =m, +... + my = niin, (3.1.7) 
where 
= 1 
My = = im +... + mn), 


the average expectation. 
In general, to find Var{S,,} we should know the dependency structure between the X’s. 
In this book, we mostly restrict ourselves to the case of independent addends. In this case, 


Var{Sn} = 0? +... +02 = nō, (3.1.8) 
where o? = Var{X;}, and 
o] 
5 = (01 +... +07); (3.1.9) 


the average variance. 

> Note that the independence of X’s is not necessary for (3.1.8) to be true: we need only 
the X’s to be non-correlated (see Section 0.2.4.3). 

Moreover, if the r.v.’s are correlated, this does not exclude the possibility of normal ap- 
proximation and the CLT may be still true under certain conditions. In this case, we should 
just write a correct representation for Var{S,}. The reader familiar with the notion of 
covariance (see again Section 0.2.4.3 for detail), knows that, in general, 


Var{Sn} = 0, +...+6,+2 ¥ Cov{X;,X;}. 


i>j 


In this case, we can use the approximation below, defining 6? as Var{S,}/n. < 
From (3.1.7) and (3.1.8) it follows that we can rewrite (3.1.5) as 


Ox FpsOn 
Mn y/n 


This is similar to (3.1.6) and shows that when the average characteristics 6, and m, are 
bounded, O again has an order of 1/,/n. 

The choice between equivalent relations (3.1.5) and (3.1.10) is the matter of convenience. 
Though the latter explicitly shows the structure of the relative security loading 9, relation 
(3.1.5) may turn out to be more convenient in calculations. 

In the general case (3.1.10), the situation is the same as in the case of i.i.d. r.v.’s, but the 
role of the coefficient of variation is played by G,,/iin, the fraction of the average charac- 
teristics G,, and m,. Furthermore, once 0 is determined, the ith client pays (1 + 0)m,. 


(3.1.10) 


3. Premiums and the Solvency of Insurance 173 


It makes sense to note also that O in (3.1.10) should not be viewed as the real security 
loading coefficient to be used by the company. If the law and circumstances allow it, the 
company may proceed from a larger 8. The coefficient in (3.1.10) is the minimal coefficient 
acceptable for the company. 


EXAMPLE 2. Consider a homogeneous group of n = 2000 clients. Assume that the 
probability of a loss event for each client is q = 0.1 and if a loss event occurs, the payment 
is a r.v. uniformly distributed on [0,1]. In accordance with (1.2.6)-(1.2.7), the expected 
value and the variance of the separate payment X are m = q} = 0.05 and 0? = qb + 
q(1—q)(4)* ~ 0.0308. (The reader may look up the mean and the variance of the uniform 
distribution in Section 0.3.2.1.) Let B = 0.9. Then the quantile qo.9,s ~ 1.281 (see Table 2 
in Appendix, Section 2). 

Since the X’s are identically distributed, we use (3.1.6) which gives 


om £281. V0.0308 
~ 0.05 - /2000 


This approximately amounts to a 10% loading. The premium for each client is (1 + 0)m ~ 
(1 +0.1)0.05 = 0.055 units of money. 


zæ 0.1005. 


EXAMPLE 3. Assume that a portfolio of a company consists of two homogeneous 
groups of risks. For the first group, the number of clients nı = 2000 and the probability of a 
loss event for each client is qı = 0.1. The payment, if a loss event occurs, is a non-random 
amount of z; = 10 units of money. For the second group, the corresponding quantities are 
nz = 400, q2 = 0.05, and z2 = 30. In particular, n = nı +n = 2400. 

Assume that the loss events are independent and B = 0.9. For a particular payment X 
(the index is omitted), E{X} = qz and Var{X} = zq(1 — q). Hence, 


E{S,} = niqızı +1n2qG222 = 2599.2, 
Var {Sn} = migqi(1 —qi)zt +mq2(1 — q2)z3 = 35100. 


Then, by (3.1.5), 
ae 1.281 -V35100 
y 2599.2 


that is, about 9.2%. Each client from the first group should pay a premium of (1 + 9)giz) ¥ 
1.092, while for the second group, the individual premium is (1 + 0)q2z2 ~ 1.638. 


= 0.092, 


3.1.2 An important remark: the standard deviation principle 


The representation of the premium as 
Cn = (1+ 8)E{S,} (3.1.11) 


is traditional. However, as we saw above, the loading coefficient O depends on the distribu- 
tions of X’s. In the limiting case, it depends on the mean and variance of S,,. So, as a matter 
of fact, E{S,} appears in the r.-h.s. of (3.1.11) twice: as the second cofactor and implicitly, 
in 8. 


174 2. AN INDIVIDUAL RISK MODEL 


Another form of premium representation which is in a certain sense more convenient and 
natural, is determined by the relation 


Cn = E{Sn} +265, (3.1.12) 


where the symbol ox denotes the standard deviation of a r.v. Xand the coefficient A indicates 
loading with respect to standard deviation. The representation (3.1.12) corresponds to the 
so called standard deviation principle. 

Following the same logic as in (3.1.4), we can write 


B = P(Sy < cn) =P(Sy—E {Sy} <A0s,) =P (SE <a) = P(S <A). 
S, 


n 


Then, using normal approximation, we write P (S4 <2) ~ ®(A). Thus, up to normal ap- 
proximation, 
À = dps; (3.1.13) 


that is, does not depend on X’s and coincides with the B-quantile of the standard normal 
distribution. 

Assume, for simplicity, that X’s are identically distributed. Set m = E{X;} and 0? = 
Var{X;}. Then E{S,} = mn, Var{S,} = 07n and in view of (3.1.12) and (3.1.13), 


Ch =mn+ GpsoV/n. 
Hence, the premium per one client is 


es 
pe 


C= 


This is a nice result. The mean payment m may be viewed as the net premium, that is, the 
part of the premium which does not take the risk carried by the insurer into account. The 
additional part, which may be viewed as a payment for risk or security loading, is equal 
to Bs Jr It does not depend on m, which is natural, and is specified only by the standard 
deviation and the number of clients n. 

In conclusion, note that the representation (3.1.12)-(3.1.13) does not contradict (3.1.11) 
with 6 given in (3.1.10). Indeed, from (3.1.11)-(3.1.12) it follows that (1 + ®)E{S,} = 
E{S,}+A0os,. If we take the representation (3.1.5)—which is equivalent to (3.1.10)—we 
readily get that 


NS gE tSn} _ ABs V Var{Sn} : E{Sn} = qs- 
Os, E{Sn} Os, . 
Thus, (3.1.12) and (3.1.13) do not determine a new rule of premium determination but 
rather represent the previous rule in another (and nicer) form. For more detail on the stan- 
dard deviation principle see Section 4. 


Route 1 = page 178 


3. Premiums and the Solvency of Insurance 175 


3.2 A rigorous estimation 


Our next step is to eliminate the sign ~ providing rigorous estimates for 8. To this 
end, we should know the accuracy of normal approximation or, in other words, the rate of 
convergence in (3.1.2). In this section, we consider independent X;’s. 

Let Var{X;} = 0? and BŽ = of +... + 02, the variance of the sum Sņ. 

Now, let u; = E{|X;—m;|*}, the third central absolute moment (in short, the third mo- 
ment) of the r.v. X;. Set 


1 n 
Ln = Bs LM (3.2.1) 


ay 


First of all, note that in the case of i.i.d. X;’s, if all oj = © and all u; = u, then B? = ọ°n and 


1 1 
n= Sapp eh = Tes (3.2.2) 
So, Lẹ has an order of 1/\/n. 

The quantity L,, which is called Lyapunov’s fraction, plays an essential role in the theory 
of normal approximation. First, note that L, does not have a dimension. If we measure the 
X’s in dollars, then the dimensions of the numerator and the denominator in (3.2.1) are the 
same: (dollars)°. 

Secondly, the characteristic Lẹ is not sensitive to a change of scale. If we multiply all X’s 
by a constant c > 0, then the denominator and the numerator in (3.2.1) will be multiplied 
by cĉ, and Lẹ, will not change. (If we multiply X’s by a negative c, the denominator and the 
numerator will be multiplied by |c|*.) 

The same concerns shifting. If we add the same number to allX’s, this does not change 
Ln. The reader is suggested to check it rigorously in Exercise 49. 

The Lyapunov fraction may be considered a characteristic showing the extent to which 
the r.v.’s X; differ. The following examples will help to clarify this. 


We have already seen that in the i.i.d. case, L, is proportional to 1/,/n. Consider the 
opposite, in a sense, case. Let X; be a r.v. with a positive variance, while all other X; = 
0, i > 2. Then, B, = Or, and 


Hi 
1 
If the X;’s are not identically distributed but have “the same order”, then the characteristic 


Ln has the same order as that in (3.2.2). More precisely, this means the following. 
Let all u; < u and all ©; > © for some positive constants u and ©. Then B2 > ©?n, and 


g 1 ly 
"S n Feat 


The significance of the Lyapunov fraction is demonstrated by the following celebrated 
Berry-Esseen’s theorem. 
Let F* (x) = P(S% < x), the d.f. of the rv. S}. 


e 


176 2. AN INDIVIDUAL RISK MODEL 


Theorem 6 There exists an absolute constant C such that for any x 


[FF (x) — ®(x)| < CLr. (3.2.3) 


Proofs of this theorem and comments may be found in many advanced textbooks on Prob- 
ability, e.g., in [27], [38]. 
Regarding the constant C, the last results show that 


C< 0.56. (3.2.4) 


More detailed discussion and references may be found in [122, p.259]. 
From (3.2.3), we immediately get that F* (x) > B(x) if 


L, — 0. 


This is a sufficient condition for normal convergence for the case of non-identically dis- 
tributed r.v.’s. However, as a matter of fact, Theorem 6 gives us more, namely, the accuracy 
of normal approximation. 

Note that the bound in (3.2.3) is universal, that is, it is the same for all x’s and for all 
distributions with the same third moments and variances. This means that the bound is 
oriented to the worst case among all distributions mentioned and all x’s. In many particular 
cases, the real rate may be much better, especially for large x’s. Many results of this type 
and further discussions may be found, e.g., in [14], [100], [122]. 


We apply the Berry-Esseen theorem to estimate the security loading coefficient 0. 

Denote by A, the r.-h.s. of (3.2.3), that is, the rate. Let a premium c, be defined as in 
Section 3.1.1. We want P(S,, < cn) to be not smaller than a given P. Similar to (3.1.4), we 
have 


P(Sn < en) =P (Sq < OE{Sn} / /Var{Sn}) = Fs (8E{S,} / VVar{S,} ) 
>09 (8E{5,} i JVar{S,} ) aN, (3.2.5) 


where in the last step, we have used (3.2.3). 
Let us choose 8 such that 


D (8E{S,} / VVar{S,} ) > B+ An. (3.2.6) 
Then from (3.2.5) it will follow that P(S, < cn) > B+An,—An=B. Thus, for O from (3.2.6), 
P(Sn < cn) 2 B. 
A solution to (3.2.6) is given by the inequality 


BE {Sn} 


ee s) 
varin 
IB+An.s V Var{ Sn} 


E{Sn} 


and 


0> (3.2.7) 


3. Premiums and the Solvency of Insurance 177 


Comparing (3.2.7) with (3.1.5), we see that the difference consists in replacing the quan- 
tile gg, by a larger quantile qg+a,,s- In particular, estimate (3.2.7) is larger than the heuristic 
estimate (3.1.5). (It is natural that in order not to rely on heuristic calculations, we increase 
the functioning reliability by establishing a larger premium.) 

Similar to (3.1.10), the estimate (3.2.7) may be rewritten as 


o > sere. (3.2.8) 


EXAMPLE 1. Let a group of clients be homogeneous, and let the probability of the loss 
event for each client be q. The payment, if a loss event occurs, is a certain amount z. 

For any particular X (the index is omitted), the mean m = qz, the variance 0” = z?q4(1—q), 
and the third moment u = E{|X — qz|?} = |z — qz|°q + |0 — qz[? (1 — q). Since z > 0 and 
q < 1, we have u= z2 (1 - q} q +20 (1 -q) =z24(1 -q)(1 -2q +24°). 

Because the group is homogeneous, it suffices to consider one X; for which 


u _ qU -4)(1-24+24°) _ 1—24 +20 (3.2.9) 
o3 (z2q(1 —q))3/? q(1-q) 


The last expression does not depend on z which is understandable: - is not sensible to 
change of scale. 


Now, let q = 0.1, B = 0.9, and n = 2000. In this case, as is easy to compute,  < 2.734, 
and in view of (3.2.4), 


l u 
An = C—= — < 0.03425. 
Then qp+a,,s < 90.93425,s < 1.5082. If we take the last number as an estimate for qB+A,,s> 
then (3.2.8) will be true, and hence P(S, < (1+6)E{S,}) will not be less than B. 
The group is Payee, So, m = qz and ©? = 27q(1 — q). Then, the coefficient of 


variation (o/m) = ,/(1— = 3. Eventually, using (3.2.8), we come to the inequality 


2 1.5082 -3 
~  V¥2000 


As is easy to compute using qg, instead of qg+,,,;, the heuristic estimate (3.1.10) gives us 
= 0.085. 

It is interesting to compare results for different values of n and B. 

Denote by 8,,9 and 0, the estimates given by (3.1.10) and (3.2.8), respectively. The reader 
is invited to verify the figures in the following Table 1. 


TABLE 1. 


~ 0.101. 


An Gps | WB+An,s | Oro On 

n = 2000, B = 0.8 0.034 | 0.841 0.971 | 0.056 | 0.065 
n = 2000, B =0.9 0.034 1.281 1.508 | 0.086 | 0.101 
n = 2000, B =0.95 | 0.034 1.645 | 2.150 | 0.110 | 0.144 
n = 8000, B =0.9 0.017 1.281 1.386 | 0.042 | 0.046 
n = 8000, B =0.95 | 0.017 1.644 1.840 | 0.055 | 0.062 


178 2. AN INDIVIDUAL RISK MODEL 


We see that the discrepancy between Ono and 0, increases for larger B’s. On the other 
hand, as n increases, the difference between 0,9 and 8, decreases. A more detailed discus- 
sion is relegated to Exercise 50b and remarks there. 


An important remark. Let us return to the case n = 2000 and B = 0.9, for which the 
heuristic estimate 8,9 ~ 0.086, while 8, ~ 0.101. The comparison of these two numbers 
should not lead us to the wrong conclusion that the real error of the heuristic estimate is 
of the order of 0.101 — 0.086 = 0.015. This is not a real error but an estimate of the error 
obtained by using Theorem 6. In other words, 0.101 is the loading which guarantees the 
given security level due to Theorem 6. However, this does not exclude the existence of an 
advanced result that could give an estimate closer to the heuristic one. Such results indeed 
exist; some of them may be found, e.g., in monographs [14], [100]. 


EXAMPLE 2. Now consider the portfolio of the two groups from Example 3.1.1-3. 
We have already shown how to compute u and o for a particular X. Then, for the whole 
portfolio, 


_ mig (l —qi) (1 = 2q1 +241) +122392(1 = ga) (1 = 2q2 + 243) 
(mal — qu) + mBqn(1 - gp)” 
It is straightforward to calculate using the particular values from Example 3.1.1-3 that L, ~ 
0.093. 
Hence, A, = CLy ~ 0.560 - 0.093 ~ 0.052. Let B = 0.8. Then qg+4,,s < 90.852,s © 1.045. 
In Example 3.1.1-3, we have found that E{S,,} ~ 2599 and VarS,, = 35100. 
Then the bound (3.2.7) gives the value 0, ~ Laa EIN ~ 0.075, while by (3.1.10), 
Owo & 0.842- 35100 ~ 0.060. 


Ln 


EXAMPLE 3. Let us return to Example 3.1.1-2. We have already computed that m = 
0.05 and o? ~ 0.0308. To apply (3.2.7), we need to calculate the third moment 


1 
u= E{|X —m|?} = E{|X —0.05|7} = |0 — 0.05} (1 —q) +a |x — 0.05|%dx ~ 0.020 
0 


for q = 0.1. The above value may be obtained either by direct integration or by use of 
software. 
Since the group is homogeneous, 


ol 0.56 -0.02 
oy/n ~ (0.0308)3/2x/2000 


An = ~ 0.046. 


Eventually, 


9> qo0.9+0.046,s V 0.0308 


~ 0.126. 
~ 0.05 - 2000 


3.3 The number of contracts needed to maintain a given security 
level 


Above, given a risk portfolio, we were seeking a O ensuring that the probability of not 
suffering a loss is not less than a given level B. Assume now that 0 is given and we are look- 
ing for a sufficient number of contracts n for which the probability mentioned is sufficiently 


3. Premiums and the Solvency of Insurance 179 


large. This is a natural statement of the problem. A premium is a market characteristic and 
it is determined not only by companies but by clients also, in accordance with their attitude 
to risk. Certain rules and regulations also keep premiums below some level. 

If we restrict ourselves to the heuristic estimate of Section 3.1.1, the problem turns out to 
be simple. Namely, we should find n from (3.1.10) for given B and 0, and the distribution 
of X’s. If the group is homogeneous (X’s are identically distributed), the coefficient of vari- 
ation 6, /ii, = 6/m, where © and m are the standard deviation and the mean of a separate 
X. In this case, we immediately get from (3.1.10) that 


n ~ (qpso / (m9) )”. (3.3.1) 


EXAMPLE 1. Let us return to Example 3.1.1-2. For B = 0.9 and n = 2000, we got 
© ~ 0.100. Assume that potential buyers of the insurance product agree only on premiums 
with 8 = 0.07. How many clients should the company have in this case in order to keep the 
same level B ? From (3.3.1), we have 


2 
na (1.281 - 0.0308 / (0.05 -0.07) ) x 4122. (3.3.2) 


There is another way to obtain the same answer. Since we have only changed 8, we could 
use the answer from Example 3.1.1-2 and, because n is proportional to 0~*, write 


(0.1005)? 


20005 2 
(0.072 


x 4122. 


In any case, the answer above represents only a rough approximation, and eventually we 
can say only that “n should be around 4000”. 


Route 1 => page 199 


A rigorous approximation leads to somewhat more complicated calculations since in this 
case, n should be a solution to the inequality (3.2.6): 


® (0E{Sn} / /Var{S,} ) >B+An. (3.3.3) 


Let the group be homogeneous, and m,o?, and u be the mean, the variance, and the third 
moment for a separate X. Then E{S,} = mn, Var{S,} = 07n, and A, = Cu/ (0° vn). In- 
equality (3.3.3) may be rewritten as 


b(0V/nm/o) > B+Cu3/(o? Vn). (3.3.4) 


The 1.-h.s. of (3.3.4) is an increasing function of n, while the r.-h.s. is decreasing. So, 
the numerical solution for the inequality (3.3.4) is easy: starting with a “non-large” n, we 
increase it until we find the first n for which (3.3.4) becomes true. Certainly, if we do not 
want the procedure to be long, we should not start from n = 1, but from a value of n closer 
to the solution we are looking for. For example, we can take, as a starting value of n, the 


180 2. AN INDIVIDUAL RISK MODEL 


Adjusting calculations 


a 
7) 


A| B C D EJF A] B c D UJ ET F 


the indicator the indicator 
n gı(n) |g2(n) n gı(n) g2(n) 
for g1>g2 for g1>g2 


4000 0.897048| 0.932761 5350 0.928249 0.928328 
4100 0.899837| 0.932359 4000 5351 0.928268 0.928325 5350 
4200 0.902538] 0.931972 100 5352 0.928286 0.928323 1 


4300 0.905153) 0.931598 
4400 0.907688| 0.931237 
4500 0.910144| 0.930888 
4600 0.912525, 0.93055 
4700 0.914833/ 0.930223 
4800 0.917072) 0.929907 
4900 0.919243) 0.9296 
5000 0.92135| 0.929303 
5100 0.923395 0.929014 
[44] 5200 0.92538] 0.928733 

5300 0.927307) 0.928461 
5400 0.929178] 0.928196 
5500 0.930995/ 0.927939 
5600 0.93276| 0.927688 
5700 0.934474) 0.927444 
5800 0.93614, 0.927207 


5353 0.928305 0.92832 
5354 0.928324 0.928317 
5355 0.928343 0.928315 
5356 0.928361 0.928312 
5357, 0.92838 0.928309 
5358 0.928399 0.928307 
5359 0.928417 0.928304 
5360 0.928436 0.928301 
5361 0.928455 0.928299 
[44] 5362 0.928473 0.928296 
15| 5363 0.928492 0.928293 
16| 5364 0.92851 0.928291 
17 | 5365 0.928529 0.928288 
18| 5366 0.928548 0.928286 
[19] 5367 0.928566 0.928283 
20| 5368 0.928585 0.92828 
[21] 5900 0.937759| 0.926975 21| 5369 0.928604 0.928278 
22| 6000 0.939332) 0.926749 22| 5370 0.928622 0.928275 
23 23 


FIGURE 9. An Excel worksheet for Example 3.3-2. 


Elise ele ele] |x|- 


r [3 P E [EA EEN E E P PON E R 
sjsjelelzlalallellleled]e]e]e] GN 


=--2------0po0o0o0o00000000 OO 
SAASnSECCECESECCCECNNNOCICCG 


estimate for n obtained by the heuristic approach, and first choose a rough step in n. Once 
we find a rough estimate, we may adjust our calculations. 


EXAMPLE 2. Consider the situation of Example 1. We have m = 0.05, o ~ v 0.0308 ~ 
0.175, u3 = 0.020, 8 = 0.07, and C = 0.56. The reader is invited to verify that, for the 
particular numbers above, (3.3.4) may be written as 


&(0.02\/n) > 0.9 +2.072/ vn. (3.3.5) 


In the Excel worksheet in Fig.9, the 1.-h.s. of (3.3.5) is denoted by g; (n), the r.-h.s. —by 
g2(n). In the first worksheet in Fig.9a, we start with n = 4000, the step in n is 100, and the 
(rough) estimate is 5300. In the second worksheet in Fig.9b, we start with 5350, the step 
is 1, and the estimate is 5353. Certainly, a reasonable answer would be “around 5350 or 


” 


so. 


3.4 Approximations taking into account the asymmetry of S 


A r.v. X and its distribution are said to be symmetric about their mean m if P(X > m+x) = 
P(X <m-—x) for all x > 0. The r.v. S„ is not necessarily symmetric. Let, for example, X’s 
be exponential (and hence, non-symmetric). Then, for any fixed n, the distribution of S, is 
a T’-distribution and consequently, it is also non-symmetric. Another matter is that when n 
is growing, the distribution of S,, approaches a normal distribution which is symmetric, so 
the asymmetry of Sn is diminishing. In this section, we discuss how to take the asymmetry 
mentioned into account. 


3. Premiums and the Solvency of Insurance 181 


3.4.1 The skewness coefficient 


Probably the most popular, though somewhat rough, characteristic of asymmetry of a r.v. 
X is the so called skewness coefficient 


13 
Y= x = a 


where o? = Var{X}, x3 = E{(X —m)°}, the third central non-absolute moment, and m = 
E{x}. 

(The notation above is consistent with the notation for cumulants in Section 0.4.5.2, since 
the third cumulant 23 coincides with the third central moment; see Section 0.4.5.2.) 

First, note that if X is symmetric about m, then 23 = 0 and the skewness y = 0. To clarify 
this, assume that X has a density f(x). If X is symmetric about m, the density f(x) is also 
symmetric about m. Consider the centered r.v. X = X — m. Its density is f(x) = f(x +m) 
and is symmetric about 0. Then 


4 = EX =m) SEH f Fax =0 


as an integral of an odd function. 

The reader is encouraged to show that the coefficient y is dimensionless and invariant 
under shifting and change of scale: Yyi¢ = Yx, and Yex = Yx for any number c (in the last 
case, c #0). 

We say that the distribution of X is skewed to the right if yy > 0. If yy < 0, the r.v. X is 
said to be skewed to the left. 


EXAMPLE 1. Let X have the I’-distribution with parameters (a,v). First, since a is 
a scale parameter, skewness does not depend on a and we can set a = 1. Now, E{X7} = 
v(v +1), E{X?} =v(v+4 1)(v +2) (see Section 0.3.2.3) and hence, E{(X — E{X})3} = 
E{(X —v)3} = E{X3} — 3vE{X7} 4 3v°E{X} — v? = 2v. Since Var{X} =v , we have 
y=2y/ v? =2/ 7. 


Let Sn = Xı +... + Xn, where the X’s are independent r.v.’s. Let us set m; = E{X;}, 
o? = Var{X;}, and z3; = E{ (X; — E{X;})>} . Then 


EUS -ES)} = È r 
= 
Indeed, let X; = X; — m;i. Then S, —E{S,} =X, +... +X. Clearly, E{X;} = 0, and hence 
E{(Sy—E{Sy})°} = E{(R ++ &s)} = DER), 
because for i A j #k, all terms E{X?X;} = E{X?}E{X;} = 0, and E{X;X;X;} = 


E{X;}E{Xj}E{X} =0. 
Since Var{S,} =Y"_, 03, 


3/2 
n n 1 = y 
Ys, = » sa) / (£o) =f (3.4.1) 
i=l i=l n On 


182 2. AN INDIVIDUAL RISK MODEL 


where the average third moment #3, = > an 23; and the average variance G? = De O°. 
Let us consider the case when the X’s are identically distributed. Then o? is equal to 
some 67, 3; is equal to some #3, and (3.4.1) implies that 


where y is the skewness coefficient for each X. Thus, ys, — 0, which is not surprising 
because S,, approaches a symmetric r.v. 


3.4.2 The I-approximation 


Next, we discuss how to approximate the distribution of S, taking into account possible 
asymmetry. We start with somewhat naive but frequently well working T-approximation. 

Many sample distributions of aggregate claims have approximately the same shape as 
the gamma distribution; namely, they are skewed to the right and their histograms have a 
unique maximum. So, we can try to approximate the distribution of S,, by the distribution 
of a r.v. Y, =z +Y, where Y has the I’-density fiy(x) [see (1.1.10)] and z is a number. 

The r.v. Y, is called a translated T-r.v. Its density is equal to fay(x— z) by virtue of the 
rule (0.2.6.1). 

We choose z, a, and v in such a way that 


E{Y,} = E{Sn}, Var{¥,} = Var{Sn}, Yy, = Ys,- (3.4.2) 


That is, we require the coincidence of the first three moments. The first two conditions 
from (3.4.2) give 
v _ y =, 
Z+ 3 = NMn, me =n0o,,. 
From Example 1, formula (3.4.1), and the fact that skewness does not depend on shifting, 
it follows that 


6 
= 4 _ 4no,, 
= =—". 
%, 3n 
It is now easy to determine that 
26? _ —-2nG}t 
Ta Mn — — ¥ 
3n 3n 


20 ž 2 204 2 4 
a age i Z n(m- 22) =n(m— 22) v= 5 (3.4.3) 
“x3 y Y 


where y is the skewness coefficient for each X. 
Examples are given in Exercises 56-57. 


3. Premiums and the Solvency of Insurance 183 


3.4.3 Asymptotic expansions and Normal Power (NP) approximation 


The next approximation taking into account a possible asymmetry of S concerns the 
following refinement of the CLT. 

We adopt the notation of Sections 3.1.1-3.2 and assume that the X’s are independent and, 
for the sake of simplicity, identically distributed. For the relations below to be true for 
the case of non-identically distributed r.v.’s, moment characteristics should be replaced by 
certain average characteristics. 

Let m; = m, 6; = O, Ui = U, #3; = x. As before, set y = x/0°. 

In our case, Theorem 6 implies that 


mis 


1 
yn o` 


IA 
Q 


[Fr (x) — B(x)| 
In particular, this means that 


‘ 1 
(For the notation O(-) see Appendix, Section 4.1.2.) In other words, the remainder, or the 
error of the approximation in the CLT, has the order L. As follows from the correspond- 
ing theorems of Probability Theory, under rather mild conditions, the last relation may be 
replaced by the following more precise representation: 


F*(x) = ®(x) (1—x)@(x) +0 G) , (3.4.5) 


Y 
oyn 
where ọ(x) = (27) ~!/? exp{—x?/2}, the standard normal density. The representation (3.4.5) 
is called an asymptotic (or Edgeworth’s) expansion. A sufficient condition for (3.4.5) to be 
true is that the X’s have a bounded density and a finite moment EX4. The proof is not 
simple; it may be found, for example, in [14], [38], [100]. 

The expansion (3.4.5) contains more information about the accuracy of normal approxi- 
mation. The second term is given by a precise formula and has the order I The remainder 
is not written explicitly but it has a higher order than it does in (3.4.4), namely, L, Precise 
bounds for the remainder may be found, e.g., in [14] and [100]. 

Note also that we can continue such an expansion, getting precise terms of the orders 

Oa a 


TB a? Z> and so on, up to a remainder -3 for an arbitrary k. See, e.g., again [14], [38], 


[100]. Here, we restrict ourselves to the first term of the asymptotic expansion. 


-1/2 


It is natural and important that the term ai (1 —x*)@(x) in (3.4.5) involves the skewness 
n 


coefficient. This term vanishes when y = 0, and in any case tends to zero as n + œ. As 
was already noted, it is not surprising because F* (x) tends to a symmetric, namely, normal 
distribution. 

Thus, if we decide to neglect terms of order L, we adopt the approximation 


HOELIO L 


eal —x)Q(x). (3.4.6) 


184 2. AN INDIVIDUAL RISK MODEL 


As will be shown later, the expansion (3.4.5) is equivalent to the following two (equiva- 
lent) representations: 


RO= (atl 2) -o(2), 3.4.7) 


r; (y+ ra -1))= e6)+0(2). (3.4.8) 


We will see below why it is convenient to write in (3.4.8) y rather than x. 

Approximations based on (3.4.7) or (3.4.8) are called normal power (NP) approxima- 
tions. As we will see below, the latter formula is convenient for estimating quantiles of 
Sn. We justify (3.4.7) and (3.4.8) in the end of this subsection, and now let us return to the 
situation of Section 3.1.1 and loading coefficients. 

We saw in Section 3.1.1 that the insurance company does not suffer a loss with prob- 


ability B if the loading coefficient @ satisfies P (s; < OE{S,} / VVar{Sn} ) = ß. In our 


situation, this is equivalent to 
0 
r (2E) =p. (3.4.9) 


or 


Applying (3.4.8), we see that for (3.4.9) to be true with an accuracy of o(}), it suffices to 


set amy 
myn Ya 
=y4 1 
(oj y am? ) 


with y = gps, the B-quantile of the standard normal distribution. We get from this that 


=O es iea 


(compare with (3.1.6)). 


EXAMPLE 1. Consider a homogeneous group of n = 200 clients. The probability of the 
loss event for each client is q = 0.1, and the payment if a loss event occurs is the certain 
amount z = 10. 

In this case, y does not depend on z and equals (1 — 2q) /4/q(1 — q) ~ 2.66 (see Exercise 
54). For B = 0.95, formula (3.4.10) gives 8 ~ 0.360, while the “classical” formula (3.1.10) 
gives 0.348. In Exercise 59, the reader is encouraged to provide an Excel worksheet to 
solve this problem for various values of n and q. 


In conclusion, we show that (3.4.7) and (3.4.8) are equivalent to (3.4.5). 
First, set € = ii (1 —x*) and note that such an £ = O( a: Writing the Taylor expansion 


for B(x +€), we have 
o(x+ Meus ®(x+e) =O(x) + 0'(x)e+ O(c?) = (x) + O(x)e + O(e7) 


= B(x) +p) 


T(1-x 2) +0 (e g) = ®@ (x) + Q(x) ti (1 ¥)+0(-). 


4. Some General Premium Principles 185 


So, we have come to the r.-h.s. of (3.4.5). 
To obtain (3.4.8), let us set g(x) = (1 —x*)@(x) for a moment, and x = y +£, where now 


eas ie 
ae 


q(y +£), we have 


(y? — 1). Making use of (3.4.5) and Taylor’s expansions in £ for ®(y +£) and 


Y 


It remains to observe that € was chosen exactly in a way that @(y)e + —~q(y) = 0, so we 
n 


6/n 
have arrived at (3.4.8). E 


4 SOME GENERAL PREMIUM PRINCIPLES 


In this section, we touch on some general principles of determining risk premiums; that 
is, premiums taking into account riskiness incurred by an insurance organization. Potential 
clients of the organization may or may not agree with the premiums suggested. The latter 
case will lead to a further adjusting process, but we do not explore this issue here. Some 
possible preferences of individuals were discussed in Chapter 1. 

Speaking of risk premiums, we mean that profits and expenses are not included in calcu- 
lations, and we determine a pure premium for risk involved. 

Two situations are usually distinguished. 

In a short term insurance, a single premium is paid at the beginning of the period under 
consideration to cover the future claim (risk). In this case, the premium is a function of the 
random claim, and this function is what should be determined. 

In the case of life insurance or, for example, pension plans, the policy is based not on a 
single premium but rather on a sequence of premium payments to be carried out at a certain 
rate. For example, we may talk about monthly premiums. 

We call such a type of premium payments premium annuity and explore it later, in Sec- 
tion 10.1. In this section, we consider the first type of premiums mentioned. 


For a particular contract, denote by X the r.v. of the possible payment of a company 
(the company’s risk), and let P be the corresponding premium. Without stating it explicitly 
each time, we assume that P is a function of X, which we write as P = 1(X ). Certainly, this 
function assumes numerical values. 

We list below and discuss some general premium principles; that is, the rules of deter- 
mining the function 7(-). 

Note that although X is the payment corresponding to a single contract, the rule of pre- 
mium determination for a particular contract may (and should) depend on the whole risk 


186 2. AN INDIVIDUAL RISK MODEL 


portfolio with which the company is dealing. In particular, it concerns the first principle 
below. 


1. The expected value principle: 
P=(1+80)E{X}, 
where 8 > 0 is a relative security loading coefficient. 


We systematically considered this principle in this chapter and continue in Chapters 3. 
The choice of a value of O in all the models of these chapters is determined by certain 
characteristics of the portfolio. For example, © may depend on the number of the contracts 
of the portfolio; see, for instance, (3.1.10). 


2. The variance principle: 
P=E{X}+AVar{x}, (4.1) 


where A > 0 is a “weight” assigned to the variance. The sign “+” above indicates 
the risk aversion of the decision maker—the more the variance (riskiness), the larger 
the premium should be. 


This is, actually, a version of the mean-variance principle that we discussed in detail in 
Section 1.1.2.5. The only difference is that now we apply it to premiums, which is reflected 
by the fact that the weight A is positive. 

As was noted in Section 1.1.2.5, one should use variance as a risk measure with caution: 
it may lead to conclusions contradicting common sense. Consider a simple example very 
close to Example 1.1.2.5-1. 


EXAMPLE 1. Let X = a > 1 with probability one, and a random loss 


_ fO with probability 1/a, 
-~ | a with probability 1 — 1/a. 


Thus, n(X) = E{X}+AVar{X} =a+A-0 =a, while n(Y) = E{Y } + AVar{Y} = (a — 
1)4 rat 1) = (1+A)a—(1+A). 

Clearly, whatever the positive À is, we can choose a large enough for n(Y) > n(X). On 
the other hand, X > Y with probability one, and common sense dictates that the premium 
for X should not be smaller than that for Y. 


So, the rule (4.1) is not monotone in the sense that the relation X > Y does not imply 
that n(X) > n(Y). In Section 1.1.2.5, we discussed situations when such a criterion may 
nevertheless be applied, and when it should not be used. We will continue this discussion 
below, but first consider 


3. The standard deviation principle: 


P =E{X} +A,/Var{X} = E{X} +hox, (4.2) 


where Ox denotes the standard deviation of X and à > 0 is a weight. 


4. Some General Premium Principles 187 


This rule has the same shortcoming as Rule 2 above: it may assign a larger premium 
to an obviously smaller risk. For instance, for X and Y from Example 1, n(X) = a, while 
mY) =a—1+AV/a—1 >a for sufficiently large a. 

However, if we deal with a portfolio with a large number of separate contracts, the situa- 
tion may change. In this case, the total loss may be well approximated by a normal r.v. for 
which the mean-variance and mean-standard-deviation criteria work well (see again Sec- 
tion 1.1.2.5 and also Section 3.1.2 where we consider this principle in the case of a large 
portfolio). So, if we apply the rule (4.2) to the whole portfolio, the result may be quite 
reasonable. 

This concerns both criteria 2 and 3, but as we saw in this chapter, when applying normal 
approximation, it is convenient to work with standard deviations. In particular, when con- 
sidering the case of a homogeneous portfolio in Sections 3.1.2 and 1.2.1, we set A in (4.2) 
equal to qgs/ y/n, where n is the number of contracts in the portfolio, B is a given security 
level, and qg, is the B-quantile of the standard normal distribution. 


4. The mean value principle. This is a term in use, though in the utility theory frame- 
work of Section 1.3, the term ‘certainty equivalent principle’ would be more natural. 
The rule under consideration is specified by a given increasing function g(-) and is 
defined by the relation 


g(P) =E{g(X)}, or P=c(X)=8 '(E{g(X)}). (4.3) 
If we view g(x) as a utility function, we may view P as the certainty equivalent of X. 


We chose the notation g(x) instead of the traditional notation u(x) in order to emphasize 
that g(x) should not be interpreted as the utility function of the company. 

If the company had expected the future income X, and if g(x) had been the utility function 
of the company, then P from (4.3) would have been the amount of money equivalent to X. 
However, the company does not receive X but rather pays it, which is not the same. The 
premium P from (4.3) is what the company would agree to receive for paying (covering) 
the loss X in the future. The function g reflects the attitude of the company to losses (while 
a utility function deals with income), and consequently, if the company is a risk averter, 
g(x) should be convex (!) assigning larger weights to large values of the loss. Later, we 
will also consider a principle based on the company’s utility function. 


EXAMPLE 2. Let g(x) = e**, where a parameter B > 0. This is a convex function. In 


accordance with (4.3), 
1 


1 
P= pra = g inMx(B), (4.4) 
where Mx (z) is the m.g.f. of X. We came to this rule in Section 1.3.2; see (1.3.2.6). The 
particular rule (4.4) is called the exponential principle. By Jensen’s inequality (1.3.4.2), 
1 1 1 
P= —InE{ePX} > — Ine” {PX} = -BE{X} = E{X}, 
p B p 
which reflects the risk aversion of the company. In Exercise 65, we continue this example 
and, in particular, compare (4.4) with the criterion F InE{e**} from Section 1.3.1.3. 


188 2. AN INDIVIDUAL RISK MODEL 


5. The utility equivalence principle. Now, we assume that the company’s preferences 
correspond to the expected utility maximization with a utility function u(x). In this 
case, P is a solution to the equation 


E{u(w+P—X)}=u(w), (4.5) 


where w is the initial reserve corresponding to the insurance under consideration. 
We have already applied this principle for premium determination in Section 1.3.2, 
considering an insurance company and a separate client, as well. 


Sa. The zero utility principle. Setting w = 0 in (4.5), we come to P as a solution to the 
equation 
E{u(P—X)}=u(0). (4.6) 


In this case, we compare the utility of the profit P — X with the utility of zero profit. 


5b. The exponential principle. Setting u(x) = —e~®* with B > 0 in (4.6), we come again 
to (4.4). We already considered this principle in Section 1.3.2. 


6. The Escher principle (first suggested by H. Bühlmann in [21]): 


_ E{xe™*} 


P= Ee’ (4.7) 


where a > 0 is a parameter. 


Such premiums arise in many models—in particular, in some reinsurance schemes, or as 
premiums minimizing losses in some particular cases. 

One can view (4.7) as follows. Assume, for simplicity, that X has a density f(x). If we 
had chosen a premium equal to the mean value of X, we would have computed fọ xf (x)dx. 
If we assign a weight w(x) to each value x, we will deal with [5° xw(x) f(x)dx. 

The last integral looks like the expected value with respect to another density, namely, 
w(x) f(x). However, for this function to be indeed a density, the integral fọ w(x) f(x)dx 
should be equal to one, which is not true. To fix it, we should normalize the function 
w(x) f(x), that is, divide it by fọ w(x) f(x)dx. Eventually, it leads to 


a Jo xw(x) f (x)dx 
Jo w(x) f @)dx ` 


The definition (4.7) corresponds to the particular case w(x) = e™. As a matter of fact, we 
can use (4.8) for various weighting functions w(x). 

The reader who is familiar with the material of Section 1.4.2 recognized in the above 
criterion a particular case of the weighted utility criterion, which we considered in the 
section mentioned with examples, properties, and axioms. 


P (4.8) 


The Escher parameter © reflects the degree of risk aversion. For a = 0, the premium 
P = E{X}, and the premium P as a function of & is increasing. (A detailed advice on how 
to prove it is given in Exercise 64.) Hence, P > E{X} for a > 0, and the larger a is, the 
more P differs from the mean loss. 


4. Some General Premium Principles 189 


7. The Swiss principle (first suggested in [23] by H. Biihlmann, B. Gagliardi, H. Gerber, 
and E. Straub). This principle unifies some of the rules above. Let A € [0, 1] and g(x) 
be an increasing function. Then we define P as a solution to the equation 


E{g(X —AP)} = g((1—A)P). (4.9) 


If A = 0, then P = g~'(E{g(X)}), which amounts to the mean value Principle 4. 

If A = 1, we come to the zero utility Principle 5a with u(x) = g(—x) or u(x) = —g(—x). 
Formally, both versions lead to the same result since if we insert the latter function into 
(4.9), the minus will cancel. However, the choice of u(x) = —g(—x) is more natural. We 
saw that in the mean value principle, it is reasonable to choose a convex g(x). Then —g(—x) 
will be concave. (Consider the second derivatives, although it would be more illustrative to 
draw a typical graph of a convex g(x), and then realize how the graph u(x) = —g(—x) will 
look.) 


With A = 1 and g(x) = xe, we come to the Escher principle. Indeed, in this case, 


E{g(X —P)} =E{(X —P)e eX} Se LE{Xe™} — PELe™ Fh. 


After inserting it into (4.9), the factor e~™” 


equation in P, we will come to (4.7). 


will cancel since g(0) = 0. Solving (4.9) as an 


Note, however, that this is a formal argument. The function xe“ is convex not for all x’s, 
and the justification connected with a weighting function is more convincing. 


The case 0 < À < 1 may be considered intermediate. 


8. The Orlicz principle (first considered by J. Haezendonck and M. Goovaerts in [54]). 
Here we fix à € [0, 1] and an increasing g(x), and define P by 


E{g(X/P*)} = g(P'™). 


For A = 0, we come to the mean value rule; for A = 1, we get the equation 


E{g(X/P)} = (1). (4.10) 


The ratio X /P may be viewed as the value of X per premium unit—a relative risk, so to 
say. If g(x) is a utility function, then (4.10) requires the relative risk to be equivalent to one 
unit of money in the sense of expected utility. 


A comprehensive analysis of premium determination and further references may be 
found in the monograph [49], and a discussion of some further properties and references, 
e.g., in [49], [57], [58], [78], [110], [126]. 


190 


2. AN INDIVIDUAL RISK MODEL 


5 EXERCISES 


Section I 


L (a) 


(b) 


Look up in Section 0.2.6 the connection between the d.f. and the density of a rv. X, 
and the d.f. and the density of the rv. Y = bX +c. 


Using the results of Section 0.2.6, show that the parameter a in (1.1.3) and (1.1.10) is 
a scale parameter. 


2. For any distribution with a non-zero mean m and standard deviation O, the ratio o/m is called 
a coefficient of variation (c.v.). 


(a) 
(b) 
(c) 
(d) 


Will the c.v. change if we multiply the r.v. by a number? 

Does the c.v. of a I -distribution depend on the scale parameter a? 
What can you say about the I’-distribution with c.v. equal to one? 
Let & be log-normal and specified by (1.1.13). 


i. Does the c.v. depend on the parameter a? 
ii. Find the parameter b in the case when the c.v. equals 1/4. 
iii. Show that, if k is the c.v., then 


E{E} =e V14+k, Var{E} =e4(14+ RP). (5.1) 


3. In the situation of distribution (1.1.10), find all c for which (1.1.8) is true. 


4. Show that the tail in (1.1.12) is “light” for r > 1 and “heavy” for r < 1. (Advice: Write 
exp{—ax"} = exp{—ax’—!x} and observe that for r > 1 and large x’s, the quantity ax’—! gets 
larger than any fixed number; and for r < 1, smaller than any fixed positive number.) 


5. The distribution with the d.f. F (x) = (x/0)¥/[1 + (x/0)”] for x > 0, where O > Ois a parameter, 
is called loglogistic. 


(a) 


Show that © is a scale parameter. 


Show that, if X has such a distribution, the r.v. 1/X has a distribution of the same type. 
With which parameter? 


Is this distribution heavy- or light-tailed? 


Verify (1.1.19), (1.1.20). 


Consider the r.v. &’ = x9§, where a number xq > 0, and the distribution of &; is defined 
in (1.1.17). What values does &/ assume? Write the density, the expected value, and 
the variance of &’. 

State rigorously a fact from Calculus from which it follows that the Pareto distribution 
is heavy-tailed. 


Show that the fact that the Pareto distribution is heavy tailed also follows from Propo- 
sition | of Section 1.1.4. 


Which type of function Q(y) from Section 1.1 corresponds to the Pareto distribution? 


Let & = bı +d, where €; is distributed in accordance with (1.1.17). Find b and d for 
which € has the distribution (1.1.18). 


10. 


11. 


12. 


13. 


5. Exercises 191 


(g) Denote by F(x) the tail in (1.1.17). Show that if & > 02, then Fy, is heavier than Fy, 
in the sense of (1.1.9). 


. Using (1.2.2), (1.2.6), and (1.2.7), write formulas for Fy (x), E{Y}, and Var{Y} in the case 


of proportional insurance (1.3.5). 


. Let a random loss € be log-normal. Consider a proportional insurance policy where the 


insurer pays kE, k < 1. Show that given that the loss event has occurred, the payment is also 
log-normal. Which parameters, if any, will change in the representation (1.1.13)? Will the 
parameter a get smaller or larger? (Hint: ke* = e!™***,) 


. The probability of a fire in a certain structure during a given period is 0.03. Ifa fire occurs, the 


damage is uniformly distributed on the interval [0, 100] (say, the unit of money is $10,000). 


(a) Assume that the insurance company covers the total damage and denote by Y the (ran- 
dom) amount the company will pay. Write E{Y} and Var{Y}. Graph the distribution 
function of Y. 


(b) Do the same for the case when the insurance contract provides coverage above a de- 
ductible of 5. 


(c) Do the same for the case when the insurance contract provides coverage above a de- 
ductible of 5, and the maximum (limit) payment is 90 units. 


A company insures the cost of injuries for a group of customers. The probability that a 
particular customer will be injured is 0.05. The cost of 40% of injuries follows the loglogistic 
distribution (see Exercise 5) with 0 = 5 units of money and y = 3. For the remaining 60% of 
the injuries, the cost is loglogistic with 8 = 3 and y= 2. The company establishes a deductible 
of 6. 


(a) Find the probability that an injury will result in a claim. 


(b) Find the probability that a particular contract (policy) will result in a claim. 


You bought auto insurance with a deductible of $200 and with no restriction on maximal 
payment. Suppose that the probability of a loss event during a year is 0.1, and the probability 
that two loss events will occur is negligible. Assume also that the distribution of the loss in 
the case of a loss event is closely approximated by the exponential distribution with a mean 
of $1000. 


(a) What percent of the loss does the insurance cover on the average? (Hint: We consider 
the case where a loss event has occurred and are dealing with the payment divided by 
the loss.) 

(b) What is the probability that the company will pay nothing during a year? 

(c) What is the expected value and standard deviation of the payment? 


(d) Graph the d.f. of the payment. 


Losses are modeled by the exponential distribution with a mean of 300. An insurance plan 
includes an ordinary deductible of 100 and pays 50% of the costs above 100 until the insured 
is paid 600. Then the plan pays 20% of the remaining costs. Find the expected payment in 
the case of loss event. 


In the situation of Exercise 11, let the maximal payment be $2,000. Graph the d.f. of the 
payment. 


192 


14. 


15. 


16.* 


17.* 


18. 


19. 


20. 


21. 


22. 


23. 


2. AN INDIVIDUAL RISK MODEL 


(a) Graph r(x) in the case of franchise deductible (1.3.4). 
(b) Consider the problems of Examples 1.3-1,4 in this case. 
(c) Derive (1.3.18) from the general formula (1.3.19). 


Losses follow the Pareto distribution (1.1.18) with some parameters 8 and a > 1. 


(a) Find the expected payment with deductible d. 


(b) Do the same for a = 3,0 = 3 in the case of inflation with a rate of v = 4%. (Advice: 
Do not recalculate everything, use the previous answer and (1.3.24)). 


Find the loss elimination ratio as a function of deductible d, if the loss variable & has (a) an 
exponential distribution; (b) the uniform distribution on [0,a]; (c) the Pareto distribution in 
the form (1.1.18) for 9 = 1, & = 2. 


Inflation impacts claims at a rate v. The loss amount & has (a) the Pareto distribution in the 
form (1.1.18), (b) the Weibull distribution. Which parameters of these distributions should 
be changed and how? 


Section 2 


Find the distribution of the sum of independent r.v.’s 


x= 0 with probability 1/4 X% = 1 with probability 2/3 
1“ | 1 with probability 3/4? ^? ~ ) 2 with probability 1/3° 


(a) Give a common sense explanation why values of S2 in Example 2.1.1-2 are not equally 
likely. 


(b) Carry out calculations in (2.1.7). 


Find the distribution of the sum of independent r.v.’s Xı and X2 if 


(a) Xı and X; are exponentials with parameters a; = 1 and az = 2, respectively; 


(b) Xı and X2 are uniform on [0, 1] and [0,2], respectively. 


Prove Proposition 4 with a direct use of (2.1.3). (Advice: First, since a is a common scale 
parameter, without loss of generality, we may set a = 1. Second, after integration, you will 
come to a constant B(v1, v2) = f (1 —¢t)Y1=!rY2>ldt. This is the so called B(beta)-function. 
It is known that it is equal to (v1 )P'(v2) /T'(v1, V2), but we do not need to know this. Since 
we know that the resulting function is a density, the value of the constant may be obtained 
from the fact the integral of a density equals one. On the way, you will automatically obtain 
the value of the B(beta)-function mentioned. See also, e.g., [120, p.153].) 


Consider Example 2.1.2-1. 


(a) What is the mean of the total claim, that is, E{S4} ? 

(b) What is the probability that S4 will be exactly equal to E{S4} ? 

(c) Using software (for example, Excel), compute and compare P(S4 > E{S4}) and P(S4 < 
E{S4}). 


Let S, = X1 +...+X,, where the X’s are independent, and X; has the I’-distribution with pa- 
rameters (1, (4) ). Show that for large n, the distribution of S,, may be closely approximated 
by the standard exponential distribution. 


25. 


26. 


27. 


28. 


29. 


30. 


31. 
32. 
33. 
34. 


5. Exercises 193 


g(z) g(z) g(z) 


(a) (b) (c) 
FIGURE 10. 


(a) Can we switch F; and F> in (2.1.1), and fı and fù in (2.1.3)? 


(b) How should we change formulas (2.1.1) and (2.1.3) to obtain the d.f. and the density, 
respectively, of the r.v. X; — X2? 


Assume that the aggregate payments for two portfolios are independent and their distribu- 
tions, as those of sums of many separate payments, are closely approximated by normal 
distributions. Let (m,o7)-parameters of these distributions be (5,8) and (3,7), respectively. 


(a) Without any calculations, find the probability that the total payment will exceed 8. 
(b) Estimate the probability that the total payment will exceed 10. 


In a county, the daily data of traffic accidents with serious injuries are values of independent 
Poisson r.v.’s. The mean values are 5 and 6 for Friday and Monday, respectively, and 3 
for other days. Write a formula and estimate, using software, the probability that a weekly 
number of such accidents will exceed 30. 


The densities of independent r.v.’s X; and X are fı (x) = Cyxe-* and fo(x) = Cox®e*, 
respectively, where C; and C) are constants. 

(a) Do we need to calculate these constants in order to find P(X; + X2 > x) ? 

(b) Estimate P(X, + X2 > 3). 

(c) Write Cı and C2. 
Let Z, be a Poisson r.v. with parameter A equal to an integer n. By using Proposition 3, show 


that Z,, may be represented as Yj +...+Y,, where the Y’s are independent Poisson r.v.’s with 
parameter A = 1. 


Let Z, be a T-r.v. with a scale parameter of a and parameter v equal to an integer n. By using 
Proposition 4, show that Z, may be represented as Yj + ... + Yn, where Y’s are independent 
exponential r.v.’s with parameter a. 


Let § be exponentially distributed, and E {§) = m. Point out all z’s for which the m.g.f. Mg (z) 
exists. 


Find the m.g.f. of the geometric distribution (0.3.1.7) by using (0.4.3.1) and (0.4.1.5). 
Find the m.g.f. of the negative binomial distribution (0.3.1.13) by using (0.4.3.2) and (0.4.1.5). 
Which function in Fig.10 looks as a m.g.f.? 


What is the difference between the two distributions whose m.g.f.’s are graphed in Fig.1la 
and 11b? 


194 


35. 
36. 


37. 


38. 


39. 


40. 


41.* 


2. AN INDIVIDUAL RISK MODEL 


g(z) e(z) 
Mz) 
Mz) 
1 1 1 
x g Z 
0 0 0 
(a) (b) (c) 
FIGURE 11. 


Let M(z) be the m.g.f. of a r.v. X, E{X} =m, Var{X} =o7. Write M'(0) and M” (0). 
Compare the means and variances of the r.v.’s whose m.g.f.’s are graphed in Fig.l 1c. (The 


graphs of M; (z) and M2(z) are tangent at z = 0.) 


(a) As we know, the m.g.f. of the standard normal distribution is M(z) = e*/2. Can the 
function g(z) = e* /? and in general, gz) = e, where c > 0, be a mg.f.? (Hint: 
Compute g’(0) and g’’(0) and then jump to a conclusion.) 


(b) Can the function 1 +z“ be the m.g.f. of a rv.? 


(c) * In general, show that if g(z) has two continuous derivatives and g(z) = 1 + €(z)z’, 
where €(z) Æ 0 for z £0, and e(z) — 0, as z — 0, then g(z) cannot be a m.g.f. 


Show that if P(X > 3) = 0.2, then the corresponding m.g.f. M (z) > 0.2e**. What can we say 
about the distribution of X if M(z) ~ 0.2e* as z — œ ? 


Let X,,X2 be independent standard exponential r.v.’s, and let Y = X; — X2. It is known that 


the r.v. Y has the density 
— 1 -|x| 
f(x) = gE (5.2) 


Such a distribution is called two-sided exponential. 
(a) Graph the density f(x). 
(b) Prove (5.2) by the convolution method. 
(c) Find the m.g.f. of Y and prove (5.2) using the method of Example 2.2-1. 


Let S = Xı +X2, where X; and X are exponential r.v.’s with E{X1} = 1, E{X2} = 2. Not 
computing a convolution but rather making use of the method of Example 2.2-1, prove that 
the density 


1 
fs(x) =e *+25¢ */2 — 9-*/2 _ e™* for x > 0, and = 0 for x <0. 


Show that the above function is, indeed, a density. 
Prove that, if a,B € [0,1] and a+ = 1, F(x) and G(x) are two d-f.’s, and the mixture 
H (x) = QF (x) +BG(x), then the m.g.f. of the convolution H*? is 

Myp2(z) = °M? (z) +20BMr(z)MG(z) +B°MZ (2), 


where functions Mp, Mg are the corresponding m.g.f.’s. Show that this implies that H*? (x) = 
a? F* (x) + 20BF (x) * G(x) + B?G**(x). Generalize it to the case H*”, as it is presented in 
(2.1.1). 


42.* 


43.* 


44.* 


45. 
46. 


47. 


5. Exercises 195 


In the fashion of Example 2.1.4-1, find the density 


*2 
(faa + 5 fa) . 


Let Z}, be a Poisson r.v. with parameter A, and ZX = (Z, —A)/ VA, the normalized r.v. Prove 
that Z* is asymptotically normal for large i; that is, P(Zx < x) => ®(x) as A — œ. 

(Advice: There are two ways to show this. First, using (0.4.3.3) and (0.4.1.5), we can write 
the m.g.f. of Z; and prove that this m.g.f. converges to the m.g.f. of the standard normal r.v. 
However, a more explicit and shorter way is to use the fact stated in Exercise 28. If A is an 
integer, the fact we are proving follows immediately from the Central Limit Theorem (see 
Section 0.6.2). If A is not an integer, we can consider [A], the integer part of A, and —using 
the same Proposition 3—represent the r.v. Z} as Zh] + Ra_faj, Where Ry_ jj is a Poisson rv. 
with parameter A — [A] and independent of Zp). Then 


A] Zy- [A] | Ra—inj — (A— IA) 

A VJ vi 

and it remains to show that (a) \/[A]/A — 1; (b) (Zi — [A])/ TÀ] is asymptotically normal 
by Exercise 28; and (c) [Ray —(A—[A])) J va] — 0 since A— [A] < 1.) 


Z = 


Let Zy be a T-r.v. with parameters a and v, Zġ = (Zy — E{Zy})/ v Var{Zy}, the normalized 
r.v. Prove that Z% is asymptotically normal for large v, that is, P(Z% < x) + B(x) as v => ©. 
(Advice: Use the result of Exercise 29 and the scheme of Exercise 43.) 


Section 3 
Verify the results of Table 3.1.1-1. (In Excel, the corresponding command is ‘BINOMDIST’ .) 


In the case of proportional insurance, the payment Y = kX. Are the coefficients of variation 
of Y and X the same? 


Consider two portfolios of 2000 and 3000 cars, respectively, insured for a single period of 
one year with $1000 deductible. The damage per car (per year) is distributed as follows: 


The first portfolio The second portfolio 
Damage (in $1000) | probability | Damage (in $1000) | probability 
0 0.78 0 0.8 
< 1 (x $1000) 0.12 < 1 (x $1000) 0.1 
6 0.05 6 0.08 
11 0.05 11 0.02 


(a) Assuming all claims to be independent, compute the expectation and the variance of 
the total amount of claims from the two portfolios. 
Use the heuristic approach to normal approximation in order to estimate the security 
loading coefficient O such that the probability that the insurance company will not 
suffer a loss by the end of the year is 0.99. 

(b) Assume that the number of cars (clients) in both portfolios became twice as large. Will 
O in this case be larger or smaller? Determine how 0 will change. 


(c) * Estimate 0 for the case (a), using rigorous calculations. 


196 


48. 


49.* 


51. 


52. 


33," 


2. AN INDIVIDUAL RISK MODEL 


An insurance company has a portfolio of 2000 cars insured for a single period of one year. 
The probability that a particular car will be involved in an accident is 0.05. The losses for 
different cars are independent. The distribution of the damage, if an accident occurs, may be 
approximated by the exponential distribution with a mean of $1000. 


(a) Graph the distribution function of a payment of the company for a separate policy. 


(b) Write the expected value and the variance of the payment for a separate policy, and for 
the total payment of the company. 


(c) Graph the distribution function of a payment of the company for a separate car in the 
case of a deductible of $200. 


(d) Considering the insurance without deductible, assuming all claims to be independent, 
and using normal approximation, compute the security loading coefficient for which 
the probability that the company would not lose money by the end of the year is 0.95. 
Use the heuristic approach. 


(e) * Do the same using the rigorous estimation approach. 


(f) Not providing any calculations, write the answer for Exercise 48d for the case when 
the number of cars is equal to 4000. 


Verify that if we replace each X; in Section 3 by r.v.’s X/ = a + bX;, where a is an arbitrary 
number, and b Æ 0, then the Lyapunov fraction will not change. 


(a) i. Recalculate results of Example 3.2-1 for q =0.1, n = 1000, and B = 0.7. (Advice: 
Do not recalculate everything; try to understand what should change, and what 
should not.) 


ii. Compute at least some entities in Table 3.2-1. 


(b) Recalculate results of Example 3.2-2 for qı =q2 =0.1, zı = 10, z2 = 20, nı = 2000, n2 = 
1000, and B = 0.8. Compare 9,9 and 0, and find premiums for both portfolios. 


(a) Find a heuristic estimate for the coefficient 6 in the case when B = 0.9, n = 4000, 
q = 0.1, and the payment in the case of a loss event is an exponential r.v. with unit 
mean. 


(b) * Provide a rigorous estimate. 
Assume that for the group from Example 3.2-1, due to a law, O cannot be larger than 0.08. 
(a) Estimate n required for B = 0.9, making use of the heuristic approximation. (Hint: You 
can use the results in Table 1 in this example.) 
(b) * Using software (for example, providing an Excel worksheet), give a rigorous estimate 
for n. (The corresponding example is Example 3.3-2.) 


(a) Let X; be an exponential r.v. with E{X,} = m, let Xz be uniform on [0,2m], and let 


y — [Xi with probability 2 
| —X2 with probability 5. 


Should the skewness coefficient of X depend on m? Can you guess without calculations 
whether the skewness coefficient positive or negative? Find it. 


(b) Find the skewness coefficient for an arbitrary uniform distribution. 


54.* 


JNE 


56.* 


57.* 


59.* 


60. 


61. 


62. 


63. 


5. Exercises 197 


Consider the risk of a certain loss of z occurring with a probability of g. Explain with- 
out calculations why the skewness coefficient does not depend on z. Show that y= (1 — 


2q)/Vq(1—4). 


Let S, =X,+...+X,, where X’s are independent and have the I -distribution with parameters 
a,v. Then, by Proposition 4, S, has the I’-distribution with parameters a,nv. Show that the 
T-approximation of Section 3.4 leads to exactly this distribution. 


Provide the I -approximation of Section 3.4 for the distribution of S1ọ = X1 +... + X19 with 
Mi 10, Oj 5; 133 75. 


Let Sn = X; +...+Xn, where the X’s are independent, identically distributed, E{X;} = m, 
and y = 1. The normal approximation for P(S, < mn) gives 1/2 (show why). As a matter 
of fact, if the X’s are not symmetric, it is not an absolutely precise approximation. Calculate 
what approximation the asymptotic expansion in (3.4.5)-(3.4.6) would give in this case. 


In Example 3.4-1, we have shown that the skewness coefficient for the I’-distribution with 
main parameter v is equal to 2/,/v. Using the result of Exercise 44 and restricting yourself, 
for simplicity, to integer v’s, connect this fact with the expansion (3.4.5). (Advice: Since we 
consider normalized r.v.’s, without loss of generality, we may set the scale parameter a of the 
I-distribution equal to one.) 


Provide an Excel worksheet for a general solution to the problem from Example 3.4.3-1. 
Analyze results for various n and q. 


Section 4* 


Let X be standard exponential, and let Y be uniform on [0,2]. So, E{X} = E{Y}. Find 
m(X) and x(Y) proceeding from Principles 1-3, Principle 4 with g(x) = x, Principle 5b, and 
Principle 6 with œ = 1/2. Compare and interpret the results. 


Show that for g(x) = x% and A = 1, the Orlicz principle may be reduced to the mean value 
(certainty equivalent) principle. 


We say that a premium principle 1(.) satisfies the positive homogeneity property, if n(kX) = 
kn(X) for any k > 0 and for all X’s for which the principle is defined. We say that the trans- 
lation invariance property holds if n(X +c) = %(X) +c for any c and all X’s mentioned. (a) 
Verify whether these properties are true for Principles 1-3 of Section 4. (b) Do the same for 
Principle 4. For which particular functions g(x) are the properties under discussion true? (c) 
Explore Principle 5. Suggest particular u(x) for which both properties will be true. (d) Ex- 
plore Principle 6. (Advice: See Section 1.1.3 for more details and comments on the properties 
we discuss in this and in the next Exercise 63. One should, however, keep in mind that in 
Section 1.1.3 we deal with gains, while when considering premiums, we deal with possible 
losses. Regarding this exercise, to show that a property does not hold, it suffices to consider 
a particular example. Also keep in mind that some premium principles are particular cases 
of others.) 


We say that a premium principle 1(-) is additive, if for any two independent r.v.’s X; and X2, 
n(Xı +X2) = T(X1) +2(X2). (5.3) 
We say that 1(-) is sub-additive if 
n(Xı +X2) < T(X1) +7(X2). (5.4) 


Note that unlike in Section 1.1.3, we consider here independent r.v.’s. The sub-additivity 
property, especially when we have the strict inequality in (5.4), is a desirable property: the 


198 


2. AN INDIVIDUAL RISK MODEL 


total premium does not become higher when we combine independent risks in one portfolio. 
See also comments in Exercise 62. 


(a) Verify whether (5.4) is true for Principles 1-3 of Section 4. Determine when the stan- 


(e 


) 


dard deviation principle is strictly sub-additive. 
Show that the exponential and Escher principles are additive. 
Does translation invariance follow from additivity? 


Let n(c) = c for any c (which is a natural property: for a certain loss the premium 
should be equal to this loss). Show that in this case, translation invariance follows from 
sub-additivity. (Hint: n(X +c) < n(X) +c. On the other hand, n(X) = n(X +c — c) < 
n(X +c) +r(—c).) 

Show that any 7 (X) equal to a linear combination of cumulants of X is additive. Show 
that the variance, exponential, and Escher principles are presented by linear combina- 
tions of cumulants (and this is why they are additive). 


64. Prove that premium (4.7) is non-decreasing as a function of oa. (Advice: Differentiating (4.7) 
in Q, we will have a fraction with the numerator 


65. 


BC ee — (Ege Th)". 


To prove that the last expression is not non-negative, write E{Xe }=E{Xe%%/2e9%X/2} and 
apply the Cauchy-Schwarz inequality (0.2.4.4)). 


(a) Look up Exercise 1.33 where we proved (based on the given advice) that P in (4.4) is 


(b 


ma 


increasing in B. So, this parameter may serve as a risk aversion characteristic, which 
was discussed repeatedly in Sections 1.3.2 and 1.3.4. 

Consider 1(X) = pinE {eX} and the function c(X) = —glnE{e PX} from Section 
1.3.1.3. Prove that c(X) > E{X} and c(X) is decreasing in B. Interpret the last fact. 
Show that while m(X) “pays more attention” to large values of X, the criterion c(X) 
ignores very large values (the saturation effect). Interpret this fact in the case when X is 
a loss, and when it is an income. (Advice: First, look up general properties of certainty 
equivalents in Section 1.3.4.2. Second, consider X = 0 and a with equal probabilities, 
for example, and analyze the asymptotic behavior of 1(X) and c(X) for large a.) 


Chapter 3 


A Collective Risk Model for a Short Period 


From a purely mathematical point of view, the model of this chapter differs from what we 
considered in Chapter 2 by the fact that now we will explore sums of r.v.’s where not only 
separate terms (addends) are random but the number of the terms is random also. In other 
words, our object of study is the r.v. 


N 
S=Sy=)° Xj, (0.1) 
jl 
where N and X1, X2, ... are r.v.’s., aS well. If N assumes zero value, we set S = 0. 

Such a model admits at least two interpretations. 

The first concerns a future risk portfolio in the situation when contracts have not yet been 
issued, and we do not know how many clients the company may have. In this case, N may 
be viewed as the number of future clients, and the X’s as future payments to the clients. In 
such a situation, each X may assume zero value with positive probability. 

The second interpretation deals with a settled portfolio. However in this case, we con- 
sider this portfolio as a whole, being interested not in separate clients but rather in the total 
claim the company will have to pay out. In this case, N is the number of future claims, 
and the X’s represent the payments corresponding to these claims. In such a scheme, it is 
natural though not necessary, to assume the X’s to be positive. The term collective reflects 
the fact that we view the portfolio as one insured unit, as a whole. 

We mostly follow the latter interpretation which appears much more frequently in appli- 
cations. 

In this chapter, when considering different characteristics of r.v.’s such as expectations, 
moments, m.g.f.’s, etc., we assume—not necessarily stating it each time explicitly—that in 
the situations under consideration, all these characteristics are well defined and finite. 

Before exploring the models of this chapter in detail, it is convenient to establish some 
basic facts which we will use repeatedly throughout this book. 


1 THREE BASIC PROPOSITIONS 


In this section, (0.1) is a purely mathematical construction which will be interpreted in 
the following sections in different ways. Assume that the r.v.’s X;,X2,... and N are mutually 
independent, r.v.’s X1,X2,... are identically distributed, and N assumes values 0,1,2,.... Set 
m = E{X;}, © =Var{X;}. 


199 


200 3. A COLLECTIVE RISK MODEL 


Proposition 1 The mean value of S is given by 
E{S}=mE{N}. (1.1) 
In particular, if N is a Poisson rv. with parameter À, then 
E{S} = mà. (1.2) 


Proof. By the formula for total expectation (0.7.2.1), E{S} = E{E{S|N}}. In the con- 
ditional expectation E{S|N}, the value of the r.v. N is given, and we deal with a sum of a 
fixed number of addends. Hence, E{S|N}=mN and E{S} =E{mN} =mE{N}. W 


> In Section 5.2.4, we will show that the condition of independence of N and the X’s is 
not necessary, and (1.1) remains true if for each n, the event {N = n} is specified by values 
of X1,...,Xn. The last condition holds, for example, if N is the first number n for which the 
(growing) sum S, = X;+...+X,, exceeds a fixed level. See Section 5.2.4.2 for detail. < 


Proposition 2 The variance of S is given by 
Var{S} = 0° E{N} +m Var{N}. (1.3) 
In particular, if N is a Poisson rv. with parameter À, then 
Var{S} = NE{X7}, (1.4) 
where X is a r.v. distributed as the X;’s. 


Proof. By (0.7.3.2), Var{S} = Var{E{S|N}}+E{Var{S|N}}. Reasoning similar to 
the proof of (1.1), we can write that, given N, the conditional variance Var{S|N} = 0°N. 
We have also shown above that E{S|N} = mN. So, Var{S} = E{0°N} + Var{mN} = 
o°E{N}+m’Var{ N}. If N is Poisson, E{N} = Var{N} =A, and Var{S} = A(0? +m”) = 
1E{X?}. m 


Proposition 3 For all z for which the m.g.f: s below are well defined, the m.g.f. of S is 
Ms(z) = My(lnMx(2)), (1.5) 


where My (-) is the m.g.f. of N, and My (z) is the (common) m.g.f. of the rv.’s Xi. 
In particular, if N is a Poisson r.v. with parameter À, then 


Ms(z) = exp{À(Mx (z) — 1)}. (1.6) 
Proof. We have 
Ms(z) = E{e*} = E{E{e" |N}}. 


In E{e% | N}, the value of N is given, so the conditional expectation E{e* | N} is the m.g.f. 
of a sum of a fixed number of terms. Hence, by the main property of m.g.f.’s, 


E{e |N} = (Mx(z))% = eM), 


1. Counting Distributions 201 


and 
E{e} = E {e00 (1.7) 


The r.-h.s. of (1.7) is the m.g.f. of N at the point (In My (z)), which implies (1.5). 
If N is a Poisson r.v. with parameter A, then the m.g.f. My(z) = exp{A(e* — 1)}. Replac- 
ing z by InMy(z), we obtain (1.6). E 


Note also that in the case when the m.g.f.’s above exist, Propositions 1-2 follow from 
Proposition 3 (see Exercise 1) but the direct proofs above are simpler than the derivation 
from (1.5). 

Various examples of applications of Propositions 1-3 will be given in further sections. 
Now, we begin to consider the collective risk model in detail starting with possible distri- 
butions of N. 


2 COUNTING OR FREQUENCY DISTRIBUTIONS 


The distribution of the r.v. N is sometimes called a counting or frequency distribution. 
Different types of this distribution are considered in the theory and applications. We begin 
with the Poisson distribution; in a certain sense, the simplest and most important distribu- 
tion. 


2.1 The Poisson distribution and theorem 
2.1.1 A heuristic approximation 


The Poisson distribution is that of an integer valued r.v. Z such that 
P(Z =k) =e A /k! fork =0,1,..., (2.1.1) 
where A is a positive parameter. As is proved in almost any course in Probability, 
E{Z} =), Var{Z} =X (2.1.2) 


(see also Section 0.3.1.5 and Exercise 3). 

There are at least two explanations why this distribution plays a key role in our model. 
First, the Poisson distribution may appear when we view the flow of claims arriving at the 
company as a random process in continuous time. In Chapter 4, we will consider in detail 
how some natural conditions on the evolution of this process lead to the Poisson distribution 
for the number of claims arriving during any given period of time. 

Another explanation is connected with Poisson’s theorem. Consider a sequence of n 
independent trials with the probability of success at each trial equal to p. Let N be the total 
number of successes. As we know, N has the binomial distribution, that is, 


P(N =k) = (0 — py. (2.1.3) 


202 3. A COLLECTIVE RISK MODEL 


The Poisson theorem tells that if n is large and p is small in a way that E{N} = np is 
“neither small nor large”, then the distribution of N is well approximated by a Poisson 
distribution. To state it rigorously, we assume that the probability p depends on n, and 


P=p =*+0(-), (2.1.4) 


es 
n n 


where A is a positive number. We again use the Calculus symbol o(x) which denotes a 

function converging to zero, as x — 0, faster than x; that is, ats) — 0 (see Appendix, Sec. 

4.1 for details). In other words, the second term o(+) in (2.1.4) is negligible for large n with 

respect to the first term A In the first reading, the reader can even ignore the term o (+). 
Thus, the r.v. N in this framework depends on n, so we write N = Ny. 


Theorem 4 (Poisson). For any k, 


P(N, =k) >e as n>, (2.1.5) 


The theorem is proved practically in any textbook on probability; see, e.g., [102], [116], 
[122]. 

Consider, for example, a portfolio of n policies “functioning” independently, and suppose 
that for each policy, the probability of the occurrence (during a fixed period) of a claim 
equals the same number p. Then, we may identify policies with independent trials and in 
the case of “large” n and “small” p, approximate the distribution of the total number of 
claims by the Poisson distribution with the parameter A = pn.! 

It is important to note that the accuracy of the Poisson approximation may be high. 


EXAMPLE 1. Assume n = 30, and, initially, p = 0.5. Then A = 15. The Excel work- 
sheet in Fig.1a shows the binomial [the r.-h.s. of (2.1.3)] and Poisson [the r.-h.s. of (2.1.5)] 
probabilities in columns B and C, respectively. The corresponding graphs are in the chart. 
The values of the distribution functions are given in columns E and F, and the difference 
between them—in column G. We see that the distributions are not close (see the chart), and 
the maximal difference in absolute value, between the d.f.’s is 0.0868 (for k = 12), which 
is large. This is not surprising since p = 0.5 is “not small”. 

Now, let n still be 30, but let p = 0.1. Then A = 3. The result given in Fig.1b shows 
that now the distributions are fairly close, the chart looks just perfect, and the maximal 
difference between the d.f.’s is 0.0107 (for k = 5), which is not bad at all. It is a bit 
surprising that such a good approximation can appear for relatively small n. 

Consider a larger n, say, n = 60, keeping A = 3. Then p = 0.05. The result is given in 
Fig 1c. We see that the maximal difference between the d.f.’s is 0.0052 (k = 5). 

In Section 2.1.2, we consider rigorous estimates of the accuracy of Poisson approxima- 
tion. 


We are aware that in Chapter 2, following a tradition, we denoted the probability of claim occurrence (loss 
event) by g. Now, when we are dealing with the binomial distribution, a stronger tradition requires to denote the 
probability of success by p. In this chapter, we will not work with models of Chapter 2, so such a replacement 


should not cause a confusion. 


2. Counting Distributions 203 


k Binomial prob.'s Poisson prob.'s Binom.d.f. Poisson d.f. The difference 

0 9.31323E-10 3,05902E-07| | 9.31323E-10 3.05902E-07 -3.04971E-07 YY 

1 2.79397E-08 4.58853E-06 2.8871E-08 4.89444E-06 -4.86557E-06| 0.5 A 

2 4.05125E-07 3.4414E-05| | 4.33996E-07! 3.93084E-05 -3.88745E-05 

3 3.78117E-06 0.00017207| | 4.21517E-06_0.000211379 -0.000207163 

4 2.55229E-05 0.000645263| | 2.97381E-05 0.000856641 -0.000826903| 15 A=np 5 
5 0.000132719 0.001935788| | 0.000162457, 0.002792429 -0.002629972 

6 0.000552996 0.00483947| | 0.000715453 0.0076319 -0.006916446 

7 0.001895986 0.010370294 0.00261144_ 0.018002193 -0.015390753| | 046 - 

8 0.005450961 0.0194443| | 0.008062401 _0.037446493 -0.029384093 

9 0.013324572 0.032407167| | _0.021386973 _0.069853661 -0.048466688 

TO 0.027981601 0.048610751| | 0.049368573 0.118464412 -0.069095838 

1 0.050875638 0.066287387| | 0.100244211, _0.184751799 -0.084507588 

12 0.080553093 0.082859234| | _0.180797304 _0.267611033 -0.086813729 

13 0.111535052 0.095606809| | _0.292332356, _0.363217842 -0.070885486 E 
14 0.13543542 0.102435867| | 0.427767776| _0.465653709 -0.037885933 

15 0.144464448 0.102435867| | 0.572232224 0.568089576 0.004142648 

16 0.13543542 0.096033625| | 0.707667644 0.664123201 0.043544444 

17 0.111535052 0.084735551| | 0.819202696! 0.748858752 0.070343944 

18 0.080553093 0.07061296| | 0.899755789| _0.819471712 0.080284077 

19 0.050875638 0.055747073| | 0.950631427, _0.875218785 0.0754 12642 

20 0.027981601 0.041810305| | 0.978613027. 0.91702909 0.061583937 

21 0.013324572 0.029864504| | 0.991937599  0.946893594 0.045044006 

22 0.005450961 0.020362162 0.99738856 _0.967255755 0.030132805 

23 0.001895986 0.013279671| | 0.999284547 _0.980535426 0.018749121 

24 0.000552996 0.008299794| | _0.999837543,  0.98883522 0.011002323 

25 0.000132719 0.004979876| | 0.999970262, _0.993815096 0.006155166 

26 2.55229E-05 0.002873006| | 0.999995785| _0.996688102 0.003307683 

27 3.78117E-06 0.001596114| | 0.999999566 0.998284216 0.00171535 1 4 7 10 13 16 19 22 25 28 31 
28 4.05125E-07 0.000855061/ | _0.999999971 __0.999139277 0.000860694 Bonn ; 
29 2.79397E-08 0.000442273| | 0.999999999  0.99958155 0.000418449| || =*= Binomial =E= Poisson 
30 9.31323E-10 0.000221137 1 0.999802687 0.000197313 


1 |k Binomial prob.'s Poisson prob.'s Binom.d.f. Poisson d.f. The difference 30 n 

2] 0 0.042391 158 0.049787068 0.042391158| _0.049787068 -0.00739591 

3 1 0.141303861 0.149361205 0.183695019| _0.199148273 -0.015453254| 0.1 

4] 2 0.22765622 0.224041808 0.41135124| 0.423190081 -0.011838842 p f 

5| 3 0.236087932 0.224041808 0.647439172| 0.647231889 0.000207283 a 
6| 4 0.177065949 0.168031356 0.824505121| 0.815263245 0.009241876 3 A=np ) 
7] 5 0.102304771 0.100818813 0.926809892| 0.916082058 0.010727834 

8] 6 0.04736332 0.050409407 0.974173211| 0.966491465 0.007681747 

9| 7 0.018043169 0.021604031 0.992216381| 0.988095496 0.004120885] | 9 5. 

10] 8 0.00576379 0.008101512 0.997980171| _0.996197008 0.001783163 

71/9 0.001565474 0.002700504 0.999545645| 0.998897512 0.000648133 

12| 10 0.000365277 0.000810151 0.999910922| 0.999707663 0.000203259 

13| 11 7.37934E-05 0.00022095 0.999984716| 0.999928613 5.61021E-05| | 0.2 4 

14| 12 1.29822E-05 5.52376E-05 0.999997698| 0.999983851 1.38467E-05 

15] 13 1.99726E-06 1.27471E-05 0.999999695| 0.999996598 3.09685E-06 

16| 14 2.69471 E-07 2.73153E-06 0.999999964 0.99999933 6.34791 E-07 

17| 15 3.19373E-08 5.46306E-07 0.999999996| 0.999999876 1.20423E-07| | 0415 

18| 16 3.3268E-09 1.02432E-07 1| 0.999999978 2.13172E-08 

19| 17 3.04413E-10 1.80763E-08 1| 0.999999996 3.5453E-09 

20| 18 2.44282E-11 3.01272E-09 1| 0.999999999 5.57013E-10|] || 

21| 19 1.71426E-12 4.75692E-10 1 1 8.3035E-11 . 

22 | 20 1.0476E-13 7.13538E-11 1 1 1.1786E-11 

23| 21 5.54288E-15 1.01934E-11 1 1 1.59806E-12 [| 
24 | 22 2.51949E-16 T.39001E-12 1 1 2.08167E-13| | 905 

25 | 23 9.73717E-18 T.81306E-13 1 1 2.68674E-14 

26 | 24 3.15556E-19 2.26632E-14 1 1 4.21885E-15 

27| 25 8.41484E-21 2.71958E-15 1 1 0 

28 | 26 1.79804E-22 3.13798E-16 1 1 0 04 

29| 27 2.95974E-24 3.48664E-17 1 1 0 1 4 7 10 13 16 19 22 25 28 31 
30 | 28 3.5235E-26 3.73569E-18 1 1 0 Binomial 
31| 29 2.7E-28 3.86451E-19 1 1 0 = æ ‘Poisson 
32 | 30 1E-30 3.86451E-20 1 1 0 z : i 


(b) 


FIGURE 1. The accuracy of Poisson approximation. 


Next, we consider the case of different probabilities of successes, which for the portfolio 
example correspond to a non-homogenous group of clients. Let 


1 with probability pj, 
0 with probability 1 — pj. 


204 3. A COLLECTIVE RISK MODEL 


1 |k Binomial prob.'s Poisson prob.'s Binom.d.f. Poisson d.f. The difference 60 

2] 0 0.046069799 0.049787068[ | 0.046069799| _0.049787068 -0.003717269 nj 

3] 1 0.145483576 0.149361205| | 0.191553375| 0.199148273 -0.007594899| 0.05 p 

4| 2 0.225882394 0.224041808| | 0.417435769| 0.423190081 -0.005754312 

5| 3 0.229845243 0.224041808| | 0.647281012| 0.647231889 4.91228E-05 

6] 4 0.172383932 0.168031356| | 0.819664944| 0.815263245 0.004401699 3 A=np > 
7| 5 0.101615792 0.100818813| | 0.921280735| 0.916082058 0.005198677 

8] 6 0.049025163 0.050409407| | 0.970305898| 0.966491465 0.003814433] Jo.25 - 

9| 7 0.019904953 0.021604031 0.990210851| 0.988095496 0.002115355 

10| 8 0.006940543 0.008101512| | 0.997151394| 0.996197008 0.000954386 

11] 9 0.002110574 0.002700504| | 0.999261969| 0.998897512 0.000364457 

12| 10 0.000566523 0.000810151 0.999828491| 0.999707663 0.000120828] | 0.2 

13] 14 0.000135532 0.00022095| | 0.999964023] 0.999928613 3.54096E-05 

14] 12 2.91274E-05 5.52376E-05 0.99999315| _0.999983851 9.29947E-06 

15] 13 5.66039E-06 1.27471E-05| | 0.999998811| _0.999996598 2.21273E-06 

16] 14 1.00014E-06 2.73153E-06| | 0.999999811 0.99999933 4.81346E-07| |°15 

17| 15 1.61427E-07 5.46306E-07| | 0.999999972| 0.999999876 9.64671E-08 

18] 16 2.38954E-08 1.02432E-07| | 0.999999996/ _0.999999978 1.79302E-08 

19| 17 3.2551E-09 1.80763E-08 1| 0.999999996 3.10896E-09| | ,, | 

20| 18 4.09267E-10 3.01272E-09 1| 0.999999999 5.05513E-10 

21| 19 4.76156E-11 4.75692E-10 1 1 7.74371E-11 

22| 20 5.13747E-12 7.13538E-11 1 1 1.12207E-11 

23| 21 5.15035E-13 1.01934E-11 1 1 1.54232E-12] |o.05 4 

24| 22 4.80535E-14 1.39001E-12 1 1 2.00284E-13 

25| 23 4.17856E-15 1.81306E-13 1 1 2.32037E-14 

26| 24 3.3905E-16 2.26632E-14 1 1 8.88178E-16 

27| 25 2.56964E-17 2.71958E-15 1 1 -1.77636E-15 04 

28 | 26 1.8206E-18 3.13798E-16 1 1 -2.10942E-15 1 4 7 10 13 16 19 22 25 28 31 
29| 27 1.20663E-19 3.48664E-17 1 1 -2.10942E-15 

30| 28 7.48476E-21 3.73569E-18 1 1 -2.10942E-15 —— Binomial 
31 | 29 4.34686E-22 3.86451E-19 1 1 -2.10942E-15 = = Poisson 
32| 30 2.36408E-23 3.86451E-20 1 1 -2.10942E-15 


FIGURE 1. (continued). 


Say, I; is the indicator of the event that the jth customer will make a claim. Let 
n 
M=} I 
j=1 


(the total number of claims). 

The distribution of N, is sometimes called the Poisson-Binomial. We will see that, if p;’s 
are small, we again can apply the Poisson approximation. To state it rigorously, assume, as 
we did above, that each probability p; depends on n, or in symbols, pj = p jn. Let 


us bill 
Pn 7 = (Pin tes + Pm), 
the average probability. 


Theorem 5 (Generalized Poisson). Assume that 


max p jn — 0, (2.1.6) 
ign 
and X, ; 
pP- mol: (2.1.7) 
n n 


for some À > 0. Then (2.1.5) is true. 


We omit the proof that may be found in many Probability textbooks, e.g., in [38], [116], 
[120]. Usually it is carried out with use of m.g.f.’s. In Exercise 7, we show that the “classi- 
cal” Poisson theorem follows from Theorem 5. 


2. Counting Distributions 205 


The main condition (2.1.7) is the same as (2.1.4), we only impose it on the average 
probability. The significance of the additional condition (2.1.6) is that all probabilities p jn 
should be small for large n. It may be shown that if, say, pi, ~ 0, then (2.1.5) cannot be 
true. See Exercise 7 which contains details of how to prove it. 

If (2.1.6) does not hold, we interpret it as if among all customers, there is at least one 
whose riskiness is comparable with the riskiness of the whole portfolio. 

In applications of this scheme, we take as an approximation, the Poisson distribution with 
N= NP, = Pin + -+ Pan: 

Consider for example, a portfolio with d homogeneous groups. Denote by n; and r; the 
numbers of clients, and the probability of a loss event for each separate client in the ith 
group, respectively. In other words, p jn = r; if the jth client belongs to the ith group. Set 
n =n; +... +na, the total number of clients; w; = n;/n, the “weight” of the ith group, 
Ai = niri, the expected number of claims in the ith group. Then 


1 d 
a a Di Ewn, A= Èpn= Xnr Vi. 
1 i=1 


EXAMPLE 2. Let d = 2, nı = 200, rı = 0.01, n2 = 100, and r2 = 0.02. Then n = 300, 
w, = 2/3, w2 = 1/3, P, = $0.01 + 10.02 = 0.0133..., and A = 2+2 = 4. So, we can 
approximate P(N, = k) by the corresponding Poisson probabilities with A = 4. We will see 
that the accuracy of such an approximation is relatively high. 


In conclusion, we mention a common interpretation of probabilities P(N > n). When 
talking about the number of claims arriving, we mean a certain period of time, say a year. 
Let the numbers of claims in consecutive years be independent. Then, in the situation when 
for example P(N > 3) = 1/5, people sometimes say that four or more claims happen on 
the average once each five years. See Exercise 6. 


Route 1 => page 215,  Route2 => page 207 


2.1.2 The accuracy of the Poisson approximation 


The rate of convergence in Theorem 5 depends on what we consider: the probability for 
N, to take on a particular value, the distribution function of N,, that is, the probability for 
N, to be less than or equal to a particular number, or the probability for N, to take on a 
value from an arbitrary set. 

Let n be fixed, A = pin +... + Pnn > 0, and 


1 n 
a= 5 E Pin (2.1.8) 


In the classical Poisson scheme, where the r.v.’s J; are identically distributed, for all j, 
probabilities p j, = some p, perhaps depending on n. Then A = np, and v, = X|np? =p. 

To understand the order of v, in the general case, assume that all pjn have the same 
order; more specifically, ¢ < p jn < 2 for positive constants c1,c2. Then cy < À < c2, and 


206 3. A COLLECTIVE RISK MODEL 
replacing p jn in (2.1.8) by their bounds, we get that v, has the order of 7 More precisely, 


iC: C 
Haya 
n 


where C1 = c? /c2, and Cp = c$ / c1. 
Denote by Z, a Poisson r.v. with parameter A. 


Theorem 6 For any set of integers B, 
|P(N, € B) — P(Z} € B)| < 2.08v,. (2.1.9) 
For p jn = p, (2.1.9) implies 
|P(N, € B) — P(Z} € B)| < 2.08p. 
Now let us consider distribution functions. 


Theorem 7 For any x, 


|P(Nn < x) —P(Z, < x)| A 7 (2.1.10) 
EASE 
where the absolute constant C = 5 + = 
Note that C = 1.127 < 1.13. 
For p jn = p, (2.1.10) implies 
IPN, < x) -PZ $x) $C. (2.1.11) 
=p 


It is easy to calculate that if v, < 0.45, then the r.-h.s. of (2.1.10) is less than the r.-h.s. of 
(2.1.9), and hence Theorem 7 gives a greater accuracy than Theorem 6. It is not surprising 
since in (2.1.10) we consider only d.f.’s. However, if we are interested in P(N, € B) for 
more complicated sets B than a half-line, then we should appeal to Theorem 6. 

The bound (2.1.9) is a generalization of a bound from [80], and is obtained in [104]. The 
bound (2.1.10) was obtained in [132]. A survey may be found in [12]. 

Note also that for v, > 0.45, it is meaningless to talk about the Poisson approximation, 
and the r.-h.s. of (2.1.9) is larger than 0.93. In this case, the normal approximation would 
be much more adequate. 


EXAMPLE 1. Consider the situation of Example 2.1.1-2. We have 
1 
Vn = 7 ((0.01)" x 200 + (0.02)? x 100) = 0.015, 


= 0.015 
P(N, < x) —P(Z, < x)| <C——— x 0.019, 
|P(Nn < x) — PZ, < x)| < Cpo ~ 009 


2. Counting Distributions 207 


which is not so bad. 


In conclusion, note that bounds (2.1.10) are universal, that is, serve for all possible values 
of pjn and n. In particular cases, the real accuracy may be better than the bounds from 
Theorems 6 and 7. 


EXAMPLE 2. Consider the situation of Example 2.1.1-1. For p =0.1, the bound (2.1.11) 
gives the accuracy ~ 1.1395 ~ 0.126, while the calculations in Example 2.1.1-1 gave the 
accuracy 0.01. If p = 0.05, then (2.1.11) results in ~ 0.060, while particular calculations 
gives ~ 0.0052. Such a discrepancy is connected also with the fact that here we consider 
the classical case of identically distributed r.v.’s, while (2.1.11) with the constant C came 


from a result for the general case of non-identically distributed addends. 


2.2 Some other “counting” distributions 


The Poisson distribution plays a central role in the theory. Nevertheless, it is the simplest 
distribution among the distributions considered in the literature and in practice. Let us 
address to other types of counting distributions. We interpret models below as those of the 
number of claims during a certain period and corresponding to a certain insurance portfolio. 


2.2.1 The mixed Poisson distribution 


Assume that the parameter of a Poisson r.v. is chosen at random in accordance with some 
probability distribution—that is, this parameter is also a random variable. As an example, 
one may consider the situation when the intensity of the flow of claims during a given 
period is determined at the beginning of the period by a random factor reflected in the 
mean value of the number of claims. (For example, we deal with road accidents, and the 
random factor concerns weather conditions.) 

Another interpretation is connected with a division of the population of clients into 
classes. Assume that in each class, the distribution of the number of claims is Poisson 
but the intensity of the corresponding flow is different for different classes. If we do not 
know to which class a particular group under consideration belongs, we can view the Pois- 
son parameter as random. 

Denote the random parameter mentioned by A. Proceeding from the statement of the 
problem, we apply the Poisson formula to the conditional probability P(N = n| A =A), and 
write 


P(N =n|A=A) =e"A"/n!. 


Let Fa (À) be the d.f. of the rv. A. By formula (0.7.3.8)—which may be viewed as the 
generalization of the formula for total probability, 


P(N =n) = f “(e ™A" /n!)d Fa (0). 


Given A, the conditional expectation E{N | A}=A, and the conditional variance Var{N | A} 


208 3. A COLLECTIVE RISK MODEL 


= A. Then, by (0.7.2.1) and (0.7.3.2), 


E{N} = E{E{N|A}} =E{A}, (2.2.1) 
Var{N} = E{Var{N|A}}+Var{E{N | A}} 
= E{ A} +Var{ A}. (2.2.2) 


Since by (2.2.1), E{N} = E{A}, we have Var{N} = E{N} +Var{A}, so unlike for the 
Poisson distribution itself, the variance of the mixed Poisson distribution is larger than the 
mean value. 


EXAMPLE 1 is somewhat formal. Let A be uniform on [a,b], a > 0. Then 


b e)n 1 1 b ae 
PV=n)= f ratga, eA dA. 


The last integral is standard; we will omit the precise formula here. In particular, 


eee b ana e e? 
By (2.2.1) and (2.2.2), 
_y2 
E{N} = a Var{N} = s+? 2 —) : 


For the m.g.f. of N, we have the following nice presentation. Let us recall that for the 
Poisson distribution with parameter A, the m.g.f. M(z) = exp{A(e — 1)}. Then the m.g.f. 
of N is 

My(z) = E{e™} = E{E{e™ | A}} = Efexp{A(e* — 1) }}. 


The last expression is the m.g.f. of A at the point (e? — 1), and hence 
Mn(z) = Ma(e* — 1), (2.2.3) 


where M4 (z) is the m.g.f. of A. 


EXAMPLE 2 (The Polya model). This is probably the most notable case. Let A have the 
T-distribution with parameters (a,v). Then (see Section 0.4.3.5) 


Ma(z) = 1/(1—z/a)’, 
and by (2.2.3), 
My(z) = 1/(1—(e —1)/a)”. 
It is straightforward to verify that 1 — (e7 — 1)/a can be rewritten as (1 — ge*)/p, where 
p=a/(1+a) andg=1—p=1/(1+a). Hence, 


My(z) = h 2 =) . (2.2.4) 


2. Counting Distributions 209 


The right member of (2.2.4) is the m.g.f. of the negative binomial distribution—that is, 
the distribution of an integer valued r.v. Ky such that for n = 0, 1,... 


P(Ky =n) = C PAE oe. (2.2.5) 


n 


The “binomial coefficient” is defined by the general formula 


Lr Wen (rant) een 


n n! 


where n is an integer, and r is an arbitrary real number, not necessarily an integer (see 
Sections 0.3.1.4, 0.4.3.2, and Exercise 8 of the current chapter for detail). 
So, in general, K ) may be negative (say, for r = 2.5 and n = 4). However, the coefficient 


in (2.2.5) is positive because, by (2.2.6), 
ae o (v+n—1)(V+n—2)-...-¥ 


n 


see also Exercise 8. 
In the particular case v = 1, the IT -distribution above is an exponential distribution and 
distribution (2.2.5) is geometric, i.e., 


P(K, =n) = pq"; (2.2.7) 


see again Exercise 8. 
The negative binomial distribution (2.2.5) serves as a good approximation in many prac- 
tical situations including those that are not relevant to mixing of Poisson distributions. 


EXAMPLE 3 ([153, N10]”). Low Risk Insurance Company provides liability coverage 
to a population of 1000 private drivers. The number of claims during a given year from this 
population is Poisson distributed. If a driver is selected at random from this population, his 
expected number of claims per year is a random variable with a Gamma distribution with 
parameters a = 1 and v = 2. Calculate the probability that a driver selected at random will 
not have a claim during the year. 

When saying “the number of claims during a given year”, we mean the conditional distri- 
bution given the conditions of a particular year. We assume this conditional distribution is 
Poisson. Let N; be the number of claims corresponding to the ith client, i= 1, ...,n = 1000. 
We adopt a model in which each N; has the mixed Poisson distribution with a random 
parameter A,, and each A; has the I-distribution with a = 1, v = 2. The values of ^4, ..., A, 
specify conditions for the year under consideration. Assume that, given Aj,..., An, the rv.’s 
N; are independent and, consequently, the (conditional) distribution of the total number of 
claims, N = Ni +...+N,, is indeed Poisson. Each N; has the negative binomial distribution 
with parameters p = 77; = 5, and v =2. In accordance with (2.2.5), P(N; =0) = p? =0.25. 
Note that we did not assume A,;’s are independent. The only assumption we made is the 
conditional independence of N;’s given Aq,...,An. 


Reprinted with permission of the Casualty Actuarial Society. 


210 3. A COLLECTIVE RISK MODEL 


EXAMPLE 4 (this example is close to the problem [151, N21]). The number of claims 
coming from a group of 200 clients is closely approximated by the negative binomial dis- 
tribution with parameters p = 0.2, v = 3. What distribution would you expect for a group 
of 400 clients of the same type? 

We adopt the model from Example 3: the number of claims, N, is Poisson with a random 
parameter A = A; +... + An, where n is the number of clients. Each A; has a I’-distribution. 
Denote the parameters of this distribution by ag,Vo. If A has also a I-distribution with 
parameters (a,v), then N has the negative binomial distribution with parameters p = rear 
and v. 

The mean value E{N} = vl=p) — X, It is natural to expect that, if the number of clients is 
twice as large, so the expected number of claims doubles. The question is which parameter 
changes: v or a? The answer depends on what we assume regarding the r.v.’s Ay,..., An. 


(a) First, suppose that claims coming from each client are independent. Then we may 
assume Aj,...,A, to be independent, which implies that A has the I’-distribution with pa- 
rameters a = do, and v = nVo. Thus, when n is changing, the parameter a does not change, 
while v is growing in proportion to n. So, in this case, N has the negative binomial distri- 
bution with the same p = 0.2 and the new v =2-3 =6. 


(b) Assume now that values of the parameter A; are specified by conditions common to 
all clients, say, the weather conditions for a particular time interval. Then it is natural to 
assume that A; =... = An. In this case A = nA; and have the I -distribution with parameters 
a = ag /n, and V = Vo. For n = 200, the parameter p = 0.2, and since p = Taa the parameter 
a = 1/4. Hence, for n = 400, the parameter a = 1/8, and N is negative binomial with 
p = 1/9, and the same v = 3. 


Next, we discuss how to determine the distribution of A if we have known the value of N. 

Let the symbol P(A € dÀ) mean P(A = À) when the r.v. A is discrete, and f(A)dA when 
Ais continuous with density f(A). In both cases, P(A € dA) may be viewed as the proba- 
bility that A will take on a value from the infinitesimally small interval [A, à + dA]. Then 


P(AEdA|N =n) =~ ae = n) _ P(N= n -o € dÀ) 
eA! P(A E dÀ) 
~ nl P(N=n)' 


If Atakes on values A), A2,... with probabilities p;,p2..., respectively, then the last relation 
may be written as 


Mn 
P(A=)|N =n) =£ - e = (2.2.8) 
where 7 
are 
P(N =n) =, —"* pr. (2.2.9) 


n! 
If Ahas density f(A), then 


2. Counting Distributions 211 


en" f(A)dr 
n! P(N=n)’ 


P(A €da|N =n) = (2.2.10) 


where 


—hyNn 
P(N =n) = j£ * f(Add. (2.2.11) 


Formulas (2.2.8) and (2.2.10) are versions of Bayes’s formula (see Section 0.1.2) for the 
discrete and continuous cases, respectively. In Exercise 9, the reader is invited to verify this 
for the case of two values of A. 


EXAMPLE 5. Let the number of daily claims for an auto insurance portfolio be a Poisson 
r.v. with a mean of three on a “good” day, and ten on a “bad” day. The probability of a 
good day is 3/4, for a bad day, accordingly, 1/4. Eight claims have arrived by the end 
of a day. What is the probability that the day falls into the category “bad”? By (2.2.8)- 


(2.2.9), P(A =10 |N =8) =e !° i : i/(e = -¢+e!0 li . 1) = 0.822; compare with 


the prior unconditional probability 0.25. 


EXAMPLE 6. Consider the situation of Example 2. The density of the r.v. A is the 
T-density f(A) = rye te. Then, by (2.2.10), the conditional density of A given 
N=nis 

An v 
e*h" fA) 1 -ayn A av- ad 
AIN = = = qu. nw 1,—a 
SS ANE Aa a 


= C(n,a, VAT Seat 


where C(n,a,v) is an expression that depends on n,a,v, and does not depend on A. We can 
certainly compute C(n,a,v), but it is not necessary. The term ”+¥—!e—(¢+)) says that we 
are dealing with the [-distribution with parameters a+ 1,n +v. Then C(n,a,v) must be 
equal to (a+ 1)”+Y/T(n +v). The reader may double-check this taking into account that 
P(N =n) is given in (2.2.5) with p =a/(a+1), but (if we have not made a mistake in our 
calculations) it is unnecessary. 

In particular, since E{A|N = n} is the mean value of the I-distribution mentioned, we 


have 
n+v 


a+1- 


Set Awe=E{A }, the unconditional mean value, and realize that in our situation, Aave = 
A as the mean value of the I-distribution with parameters a,v. Then, (2.2.12) may be 


rewritten as 


E{A|N=n}= (2.2.12) 


n+v 
Kuve +v 


This is simple and nice. If the observed value of n is larger (smaller) than Aave, then 
the conditional expected value of Agiven the observed information, is larger (smaller) 
than Aave. 

Let, for example, a = 2, v = 4. Then Aave = E{A} = (v/a) = 2. Suppose that N took on 
the value 5. Then the posterior distribution of Ais the T -distribution with new parameters 


ã=a+1=3 and ¥=n+v=5+4=9. In particular, E{A|N = 5} = 34 =3. 


E{A|N =n} = Mave’ (2.2.13) 


212 3. A COLLECTIVE RISK MODEL 


2.2.2 Compound mixing 


Let the total number of claims 
N=Y,+...+Yr, (2.2.14) 


where integer valued r.v.’s Y;, Y2,... and K are independent, and r.v.’s Y,,Y2,... are identi- 
cally distributed. For example, K may be the number of accidents corresponding to a risk 
portfolio and Y; is the number of claims due to the ith accident. 

The scheme (2.2.14) is called compound mixing. The distribution of K is called primary, 
the distribution of Y’s—secondary. 

We see that (2.2.14) is a particular instance of the scheme from Section 1: the role of N 
from (0.1) is played by K, and the role of S by N. 

Applying (1.5) to (2.2.14), we have 


My (z) = Mx(InMy(z)), (2.2.15) 
where My (z) is the m.g.f. of Y’s. If K is Poisson with parameter A, by (1.6), 
My(z) = exp{A(My (z) — 1)}. (2.2.16) 
EXAMPLE 1. Let K have the Poisson distribution with parameter A, and for all i, 


y= 1 with probability p, 
' ~~ | 0 with probability 1 — p. 


In the particular case of the example with accidents, it would mean that each accident 
causes either one claim or no claims. For example, Y; may equal zero if the size of the 
damage at the ith accident does not exceed a deductible. 

It is noteworthy that in this case N is also a Poisson r.v. with parameter pà. Indeed, 


My (z) = p&+(1—p) =1+p(e*—1), (2.2.17) 


and by (2.2.16), 
My(z) = exp{pA(e* — 1)}. 
The last function is the Poisson m.g.f. with parameter pA. 


We will return to this in Section 3.1.2 where we give another solution to the same problem 
using a special feature of the Poisson distribution. 


EXAMPLE 2. Let K be the same as in Example 1, and let each Y; take on the value 0, 1,2 
with probability po, p1, p2, respectively. Then My(z) = po + pi + pe? = 1 + pi (e® — 
1) + po(e% — 1), and by (2.2.16), 


My(z) = exp{A(My(z) — 1)} = exp{Alpi(e®— 1) + p2(e% — 1}, 
= exp{Api(e— 1) }exp{Apa(e% — 1)} 


The last expression is the product of two m.g.f.’s. The first corresponds to a r.v. Yı having 
the Poisson distribution with parameter Ap;. A r.v. whose m.g.f. equals the second factor, 


2. Counting Distributions 213 


exp{Ap2(e — 1)}, may be represented as 2Y, where Yz has the Poisson distribution with 
parameter Ap . Indeed, E{e"?} = E{e)"°}, that is, the m.g.f. of Y> at the point 2z. (See 
also formula (0.4.1.5) for the m.g.f.’s of the linear transformations of r.v.’s.) 

Thus, we can write the representation 


N=Y|+2%o, (2.2.18) 


where Y , Y2 are independent Poisson r.v.’s with parameters p;A and p2A, respectively. 

In order to compute E{N} and Var{N}, we can use (1.2) and (1.4), or proceed from 
the representation (2.2.18), which implies that E{N} = pjA+2p2A = (pı +2pz2)A and 
Var{N} = piA+4pod = (pi +4p2)à. This example is continued in Exercise 15. 

Note also that like Example 1, this example will be also revisited in Section 3.1.2. In the 
current section, we use a general approach based on m.g.f.’s; in Section 3.1.2 we proceed 
from a special feature of the Poisson distribution. 


EXAMPLE 3. Let K have the Poisson distribution with parameter à4, and all Y; have the 
Poisson distribution with parameter 17. Then the m.g_f. 


My(z) = exp{a (eD — 1)}. 


This distribution is called Poisson-Poisson. From (1.2) and (1.4) it immediately follows 
that E{N} = AyAq and Var{N} = Aj (Ao +A5). 


EXAMPLE 4. Let K be negative binomial with parameters pı and v, and all Y; take on 
values 1 or 0 with probabilities p2 and 1 — po, respectively. Let q; = 1 — p;,i = 1,2. By 
(2.2.4), Mx(z) = [p1/(1 — qie®)]’. Hence, in view of (2.2.15) and (2.2.17), 


Pegi ape 1)) 
Let p = (1—q1)/(1 —q1q2) and g = 1 — P. It is straightforward to verify that the expression 
in the brackets [-] on the r.-h.s. of (2.2.19) may be rewritten as p/(1 — ĝe), and hence N 
has the negative binomial distribution with parameters p, V. 


My(z) = [pi/(1 — qiexp{In(1 + po(e* — 1))}) (2.2.19) 


2.2.3 The (a,b,0) and (a,b,1) (or Katz-Panjer’s) classes 


Let py = P(N =k). We say that the distribution of N belongs to the (a,b,0) class if there 
exist a and b such that 
BB legit (2.2.20) 
Pr-1 k 
for all k = 1,2,... . It may be proved (see, e.g., [98]) that the only possible distributions 
satisfying (2.2.20) are those listed in Table 1 below. 


TABLE 1. 
Distributions Po a b 
1 | Poisson with A e*|0 À 
2 | Geometric with p and q = 1 — p p q 0 
3 | Negative binomial with v, pandg=1—p| p’ |q (v—1)q 
4 | Binomial with n and p and q = 1 — p q" | —p/q| (n+1)p/4q 


214 3. A COLLECTIVE RISK MODEL 


Let us check, for example, Position 3. For the negative binomial distribution, in view of 
(2.2.5), po = p“, and 


pk (NRA, y FIVER Ra 2) ya y+k-1 (v—l1)q 
Pe = ( i ra mat ea. = — e 


which leads to the result in the table. The consideration of other table entries are relegated 
to Exercise 19. 


EXAMPLE 1 ([153, N40]?). You are given a negative binomial distribution with v = 2.5 
and p = 1/6. For what value of k does p; take on its largest value? 

For the negative binomial distribution, p = pe_10(k) where 0(k) = q+ Web: In our 
example, (k) = ¿(5 + 1), We are looking for k such that ọ(k) > 1 and o(k+ 1) < 1. This 


isk=7. 


We say that the distribution of N belongs to the (a,b, 1) class if (2.2.20) is true for some 
a and b, and all k = 2,3,.... The difference is that we begin to construct probabilities re- 
cursively starting from p; rather than pg, and reserving the possibility of defining po as we 
wish, or more precisely, as it fits the data. Such a modification allows us to construct dis- 
tributions having the structure close to “classical” but adjusted to particular real situations. 

The distribution with po = 0 is called zero-truncated, otherwise—zero-modified. 

Consider, for example, the Poisson case when a = 0, b = A, and po is fixed. For other 
probabilities, 


= API E MP1 Ap2 Mp1 Mp1 
BOS ar = Saat = ae = a a 
and so on, which leads to 
(Saat 
Pk = kl 


On the other hand, 1—po=Y@_, pr=pi E; (M/k!) =p A1 EZ (Ak /k!)=pid!(e*-1). 
Thus, pı = (1 — po)A(e*— 1)~!, and for k = 1,2... 
- l-po 24 


Ir (2.2.21) 


Pk 

If we set po = e^, as in the “usual” Poisson distribution, we come to Poisson probabili- 
ties for all k. If po = 0, 
o 1 A% 
AIk 
which coincides with P(Z = k |Z > 0) where Z is a “usual” Poisson r.v. Other examples are 
considered in Exercise 20. 

> A formula similar to (2.2.21) is true in the general case. Let N be an original r.v. and 
pr = P(N =k) be the corresponding original probabilities for which 


Pk = Pr-19(k) (2.2.22) 


Pk 


Reprinted with permission of the Casualty Actuarial Society. 


3. The Distribution of the Aggregate Claim 215 


for k = 1,2,..., where (xk) is a given function. In the particular case above, 0(k) =a+b/k. 


Proposition 8 Let p; be modified probabilities, that is, 


Pr = Pr_10(k) (2.2.23) 
fork =2,3,.... Then 
j= ip Slee So. (2.2.24) 
Èk- Pk 
Proof. In view of (2.2.23) Px = 9(k) Pe-1 = O(k)O(K — bide 2= o(k)o(kK—1)-. 
(2) p1, that is, Pz = piw(k), where w(k) = o(k )o(k— 1)-...-(2 j, and cae =1.We ine 


1 — Po = Lea Pe = Pi Xg WK) and pi = (1 — Po) / Èx- wk )). Thus, 


-n Vk) 
=(1— — as 2.2.25 
Pk = (1 — po) r= we ( ) 


On the other hand, pı = 0(1)po, and px = pi w(k) = pood(1)w(k). Multiplying the nu- 
merator and the denominator in (2.2.25) by po(1), we come to (2.2.24). I< 


3 THE DISTRIBUTION OF THE AGGREGATE CLAIM 
3.1 The case of a homogeneous group 


First, we consider a homogeneous group of clients and claims coming from this group. 
Namely, we consider a fixed time period and assume that the total claim S during this 
period is represented by relation (0.1) where the sizes of claims, X;, are independent and 
identically distributed (i.i.d.) r.v.’s, and the total number of claims, N, is independent of the 
X’s. If N =0, we set S = 0. Unless stated otherwise, we suppose P(X; > 0) = 1 for all j. 

Propositions | and 2 give a clear way to compute E{S} and Var{S}. Examples are given 
in Exercise 22. 

Our object of study is the distribution of S. There are several approaches to computing 
or estimating this distribution. 


3.1.1 The convolution method 


Let gn = P(N =n) and F(x) be the d.f. of X;. Since all X’s are positive with probability 
one, 


P(S 0) P(N 0) 80; (3.1.1) 


and the d.f. Fs(x) has a “jump” of go at x = 0. Furthermore, for x > 0, 


oo 


FG) = P(S <x) = È P(S < x|N =n) PWN =n) = È PS, SINEP. 


216 3. A COLLECTIVE RISK MODEL 


where, as usual, S, = X1 + ... + Xn, and Sg = 0. The X’s do not depend on N, and they are 
mutually independent. Hence for n > 1, we have P(S, <x|N =n) = P(S, <x) =F*"(x), 
where the symbol F*” denotes F x...» F, the nth convolution of F (see Section 2.2.1.1). 
Thus, 


Fs(x) = Vo gF” (x), (3.1.2) 
n=0 


where F*°(x) (which corresponds to the case when N = 0) is defined as the d.f. of a r.v. 
X = 0, that is, F*°(x) = 1 for x > 0, and = 0 for x < 0. 

If the density f(x) = F’(x) exists, the density fs (x) also exists for all n except zero. 
Since the derivative (F*°(x))’ =0 for x > 0, the d.f. Fs(x) is differentiable for all x > 0. We 
call the corresponding derivative the density of S, which exists for all x > 0. Eventually, 
differentiating (3.1.2), we get that for x > 0, 


fs(x) = ¥ gn f(x), (3.1.3) 
n=1 


where f*” is the nth convolution of the density f; see 2.2.1.1. (Notice that the summation 
in (3.1.3) starts from n = 1 since LF”? (x) =0 for x > 0.) 

The same formula is true for the case when the X’s are discrete r.v.’s if we understand 
fs(x) as P(S = x), and f(x) as P(S, = x). The last probability is the result of the nth 
convolution of the distribution of addends. 

Formulas (3.1.2)-(3.1.3) may be useful when we can write a good representation or an 
approximation for all convolutions F*”. This is true at least for some special distributions, 
which will be demonstrated in the following examples. 


EXAMPLE 1. Let all X; have the [-distribution with parameters (a,v). In particular, for 
v = 1, it is the exponential distribution with parameter a. Denote the corresponding d.f. by 
T(x;a,v). By Proposition 2.2.1.2-4, S, has the d.f. [(x;a,nv), and if f(x) is the density of 
Xi, then the density of S, is 


POSH axle aw): 
If v is an integer, the function I'(x;a,v) can be written explicitly (see later Section 4.2.1), 
but for now, for us it matters only that it is a “good” tractable function. By (3.1.2), we have 
Fs(x) = y Enl (x3 a,nV). (3.1.4) 
n=0 


For the density fs(x) and x > 0, 


BAE ay (3.1.5) 


If the r.v. N takes on a finite and not large number of values (and hence only a finite 
number of g,,’s are greater than zero), then the sums in (3.1.2) and (3.1.3) are finite and 
may be computed numerically. 


3. The Distribution of the Aggregate Claim 217 


If N takes on a large or infinite number of values, we have to use truncation. Assume 
that we restricted ourselves to the summation in (3.1.2) over n from 0 to some k. Then the 
error will be equal to 


L gnk*"(x) < È gn =P(N >k). (3.1.6) 
k+1 k+1 


This probability may be small for a relatively moderate k. 


EXAMPLE 2. Let N have the geometric distribution in the form (0.3.1.9) with parameter p. 
Then P(N > k) = (1—p)**!; see (0.3.1.10) . Note also that E{N} = (1 — p)/p. Assume 
that there are 50 claims on the average, i.e., E{N} = 50. Then, as is easy to calculate, 
P(N > k) < 0.05 for k > 150. So, if 0.05 is acceptable accuracy for us, we can consider 
the sum P; which is computable even for a computer with moderate performance. For 


E{N} = 20, we would have P(N > 60) < 0.051 and, for example, P(N > 80) < 0.02. 


It makes sense to emphasize that the real error of truncation may be essentially smaller 
than estimate (3.1.6), since above we replaced F*"(x) by its roughest bound, namely, one. 


EXAMPLE 3 illustrates the possibilities of concrete calculations. Let us return to (3.1.4) 
and assume that N has a geometric distribution with parameter p. In Fig.2, the correspond- 
ing Excel worksheet is presented. The values of p, q = 1 — p, v, and a, are in cells G2, 
G3, G6, G7, respectively. In G9 and G10, ms = E{S} and os = \/Var{S} are computed in 
accordance with (1.1) and (1.3). For the distributions chosen, 


1) Var{s} =~. 44 ee A 8.1.7) 
P a p a 

(in Exercise 23 the reader is encouraged to verify these formulas). Once we know ms and 
Os, we can choose (cell G5) a reasonable value of x, say, in the range (ms — 30s, ms + 30s). 
Values of n and the corresponding terms in (3.1.4) are in columns A and B. The Excel 
command for, say, B3 is 

B3=$G$2*$G$3"A3*GAMMADIST($G$5,A3*$G$6,$G$7(-1), TRUE) 

The estimates for P(S < x) are in column D and obtained by summing over the first 
k terms in column B. (Not all rows are shown in Fig.2.) We see that the values of the 
estimates for k = 300 and k = 200 do not differ significantly, which allows us to suppose 
that for these k’s, the estimate of 0.87 is good enough, while the estimates for k = 150 and 
lower are not. Certainly, making use of a more sophisticated program, we can easily get 
values of Fs(x) for a large spectrum of x’s. Table 2 gives five values of the d.f. for the 
parameters as in Fig.2. 

TABLE 2. 

X 20 |50 |75 100 | 150 


Fs(x) | 0.34 | 0.64 | 0.78 | 0.87 | 0.95 


Consider two more examples. 


218 3. A COLLECTIVE RISK MODEL 


&nP(S,=<x) P(S=<x) 

0.01 0.86602 k=300 
0.0099 0.85913 k=200 
0.009801 0.78076 k=150 
0.00970299 0.63763 k=100 

0.00960596 0.40104 k=50 
0.0095099 
0.009414801 
0.009320653 49.51M gs 


0 
1 
2 
3 
4 
5 
6 
7 
8 


0.009227447 49.9975 | Os 
0.009135172 -100.492 M 5-3 Os 


0.009043821 199.492 M 5 +3 Os 


0.008953383 
0.008863849 

0.00877521 
0.008687458 
0.008600584 


Ke} 


= 
oO 


= 
= 


= 
N 


= 
oo 


ae 
a 


= 
ol 


FIGURE 2. The worksheet for Example 3. Not all 300 numbers 
in Columns A and B are shown. 


EXAMPLE 4. Assume that the distribution of X’s may be well approximated by the 
(m,6”)-normal distribution. Then, by Proposition 2, Section 2.2.1.2, the distribution of 
S, may be approximated by the (nm,no7)-normal distribution, and we can set F*”"(x) = 
(=) (verify this on your own, or see Section 0.3.2.4). Hence, 


oyn 
= x—nm 
)= Yao ( ae (3.1.8) 


In Exercise 25, the reader is invited to provide an Excel worksheet and carry out calcula- 
tions based on (3.1.8). 


EXAMPLES. Let X; take on two values, a and b, with probabilities p and q, respectively. 
Assume a > b > 0. Clearly, all values of S are combinations ka + mb, where k and m are 
integers. Then P(S, =x) £0 only if x = ka + (n — k)b for some natural k < n. In this case, 
k =k(x,n) = (x—nb)/(a—b), and P(S, =x) = (j) p*q"*. 

Note also that since the biggest value of S„ is na and the smallest is nb, P(S, =x) =0 
for all x > na or x < nb, which is equivalent to n < x/a orn > x/b, respectively. Thus, 


P(S=x)= YP tol yoy Pema ten (3.1.9) 


x/a<n<x/b 


for all x of the form ka + mb. Denote by B(z;n, p) the binomial d.f. with parameters n and 
p. Then, taking into account that P(S, < x) = 1 if na < x, and = 0 if x < nb, we have 


P(S<x)= Yo gat YY  gpB(k(x,n);n,p). (3.1.10) 


n<x/a x/a<n<x/b 


3. The Distribution of the Aggregate Claim 219 


The above sum contains a finite number of terms, and given gẹ, as well as the other param- 
eters, computer calculations need not be lengthy. In Exercise 25, the reader is encouraged 
to create an Excel worksheet and carry out the calculations based on (3.1.10). 


3.1.2 The case where N has a Poisson distribution 


The calculations in the previous section were provided for special distributions of X’s. 
In this section, we will see that if N has a Poisson distribution, then we can consider a 
sufficiently large class of the distributions of separate terms (addends) X;. The distribution 
of S in the case where N is Poisson is called compound Poisson. 

To show how we can treat S in this case and for future references, we consider some 
general properties of the Poisson distribution. Actually, these properties have their intrinsic 
value. 

Consider / independent Poisson r.v.’s, N1, ..., N; with respective means À, ..., àq, and set 
N=N_,+...+N,. Clearly, N is a Poisson r.v. with parameter À =A, +... + Àz. 

For example, a company receives each day claims of / types; the daily numbers of claims 
of each type are independent and have Poisson distributions. The first question we are 
interested in is what is the distribution of the number of claims of a particular type given 
the total number of claims? We will see that this distribution is binomial, and this may be 
considered a distinctive feature of the Poisson distribution. In general, the joint distribution 
of the N;’s given N is multinomial (for the description of this distribution, see Section 
0.3.1.2). 


Proposition 9 Let p; =A;/A, i=1,...,1. (So, pı +... + py = 1.) Then for any n = 
1,2,..., and any non-negative integers mı, ...,mı such that mı + ... + mı =n, 


{ 
POM, = my. = my|N =n) = pp (3.1.11) 


mı! 
In particular, for any i = 1,..,l, and k =0,...,n. 


n 


PIN; = kiN =n) = (7 


oka — pi. (3.1.12) 


A proof will be given in the end of this section. 

Thus, since the multinomial distribution corresponds to the case of independent trials, 
given N =n, we may identify n claims arrived with independent trials where each trial 
independently of the others has / possible results. The probability that a particular claim is 
that of type i is p; = A;/A. We discuss this issue in a bit more detail in Section 3.2.1. 

The next fact may be considered converse to the first. 

Let now N be the random number of some objects, and suppose that N is a Poisson r.v. 
with a mean of À. Each object, independently of the other objects and of the number of the 
objects, may belong to one of / types. For each object, the probability of belonging to type 
iis pi; pp t+... tpi =1. 

For example, each day, a company deals with N claims, the size of each claim equals 
either $100 or $150 with respective probabilities pı and po. 


220 3. A COLLECTIVE RISK MODEL 


Coming back to the general wording, denote by N1, ..., N; the numbers of objects of types 
i=1,...,l. Clearly, Nj +... +N =N. 


Proposition 10 Let A; = pià, i=1,...,1. Then the rv.’s N,,...,N; are independent and 
have the Poisson distribution with respective parameters A1, ..., N. 


The fact that the r.v.’s are Poisson is not very surprising; the fact that they are independent 
is less expected. If N is not a Poisson r.v., this may be not true; so the property established 
may be also viewed as a special property of the Poisson distribution. The r.v.’s N; are called 
sometimes marked Poisson rv.’s: they count only “marked” objects. 


Let us come back to arriving claims, denote by N the total number of claims, and by 
X; the size of the jth claim. Let N be a Poisson r.v., E{N} = à. Assume that the X’s are 
independent, and each X takes on / values x,,...,x; with respective probabilities p,,..., pı. 

Consider the sum S = X; +...+Xy, and denote by N; the number of the r.v.’s X that took 
on the value x;, i = 1,...,/. Then Nj +... +N; = N and the total aggregate claim 


S=xjN, +... +x, (3.1.13) 


which may essentially simplify calculations; especially if / is not large. 


EXAMPLE 1. Let us come back to the arriving claims with a size of $100 or $150. 
Assume that the number of claims during a day is a Poisson r.v. N with a mean of 40, 
and on the average, 75% of claims equal $100. If we had been solving the problem in a 
straightforward fashion, we would have introduced the r.v.’s 


X= 100 with probability 3/4, 
J | 150 with probability 1/4, 


and would have considered $ = X; + ... + Xy, the sum where not only the separate terms 
are random, but the number of terms is random also. 
As we saw, this is a complex object. However, in the case under consideration, we may 
just write 
S = $100 - Nı +$150-No, 


where N and N> are the number of claims equal to $100 and $150, respectively. By Propo- 
sition 10, N; are M, are independent Poisson r.v.’s with parameters A, = 0.75 - 40 = 30 and 
Az = 0.25 - 40 = 10, respectively. 

Thus, the sum of 40 r.v.’s on the average has been reduced to the sum of only two (!) 
r.v.’s. Such a sum is easily tractable. The first characteristics may be written immediately: 


E{S} = 100E{N,} + 150E{N>} = 100A; + 150% = 4,500; 
Var{S} = 100°Var{N,} + 150?Var{Nz} = 1007A; + 1507Az = 525,000. 


As to the probabilities that S assumed particular values, they cannot be presented by a sim- 
ple formula, but may be easily computed numerically since we deal with just two variables. 


Route 1 => page 221 


3. The Distribution of the Aggregate Claim 221 


EXAMPLE 3. Let us come back to Examples 2.2.2-1,2. We see that the scheme of these bd 
examples differ from the scheme above only in notation. 
In Example 2.2.2-1, N = Y1, where Y; is the number of the r.v.’s Y; that took on the value 
1. By Proposition 10, Yı has the Poisson distribution with parameter A; = pA. 
In Example 2.2.2-2, N = Yı +2Y2, where Yı is defined as above, and Y% is the number 
of the r.v.’s Y; that took on the value 2. By Proposition 10, Yı and Y2 are independent and 
have the Poisson distributions with respective parameter A, = pA and Az = pod. 


Let us turn to proofs. It suffices to restrict ourselves to the case / = 2. The difference 
between this case and the general case is not essential. 

Proof of Proposition 9. As has been already mentioned, N has the Poisson distribu- 
tion with parameter A = A; + Az. Set also mı =k. Then m =n—k. Since Nj, M are 
independent, 


P(N; =k,Ny =n—k|N=n)= (Ni =k,N2 =n—k, Ni +N2 =n) 


P(N =n) 
O P(N =k, Ny=n—k) P(N; =k)P(N2 =n—k) 
7 P(N =n) E P(N =n) 
_ exp{ -MJA wat / exp{—(Ai +A2)} (Ai +2)” 
k! (n—k)! al i 


The exponential terms cancel out, and we get that 


n! AKALE n\ (MNE [MN 
PN =e M =n -kN =A) = a ar e) (E) C) 


Let us turn to (3.1.12). In the case / = 2, it follows from (3.1.11) just because P(N; = 
k, =n—k|N =n) = P(N, =k|N =n). 

Consider the general case / > 2. First, it suffices to consider (3.1.12) for i= 1. Secondly, 
we may set Nə =M +... +N. The latter r.v. is Poisson with parameter A2 = 2 +... +A). 
Since N = N; +N, we have reduced the problem to the case l = 2. W 


Proof of Proposition 10. Let again l = 2. For n Æ 0, given N =n, the n objects may be 
identified with n independent trials, each of which may be successful (the object is of the 
first type) with probability pı. Hence, for any n, and any k < n, 


P(N, =k,N2 =n—k|N=n)= (oeie. (3.1.14) 


For n = 0, (3.1.14) is also true because P(N; = 0, Nz = O| N =0) = 1; and by convention, 
oat p? PS = | also. By the multiplication rule, 


P(N, =k, No =m) = P(N; =k, No =m,N=m+k) =P(N| =k,N=m+k) 
P(N; =k|N =m+k)P(N =m+k) 


(k+m)! 4 ken 7 a (pid)* 7 a (por)”™ E Ns oy m 
k!m! pipz e (k+m)! =e” kl eP? =e M1 pM 


m! k! m! 


222 3. A COLLECTIVE RISK MODEL 


This is a product of Poisson probabilities, so N;,N2 are Poisson and independent by Propo- 


sition 0.2. W 
Route 1 => page 223 


3.1.3 The m.g.f. method 


We are coming back to the general situation. When applying the m.g.f. method, we use 
the basic formula (1.5), try to find the m.g.f. of S and attempt to determine the distribution 
that corresponds to the mentioned m.g_f. 

For example, in the compound Poisson case, in accordance with Proposition 3, 


Ms(z) = exp{A(Mx(z) — 1}, 


which may help in many situations. In particular, see Section 3.2 where we are exploring 
the case of several homogeneous groups. 

Let us consider one more distribution of N restricting ourselves just to one but a very 
illustrative example which may be called classical. 

EXAMPLE 1. Let N have the negative binomial distribution with parameters (p,v). In 
this case, the distribution of S is called compound negative binomial. Assume also that the 
X’s have the exponential distribution with parameter a. 

First, consider the first version of the negative binomial distribution when N = 1,2,.. and 
the probabilities are specified by (0.3.1.13). In this case (see Section 0.4.3.2), 


1 cae v 
Ma lg Mute) = (1) -() (3.1.15) 


where q = 1 — p. Then, by (1.5), 


Ms(z) = (ato) ~ (=) 7 (E) 7 (ra), 


The expression in the parentheses: 


is the m.g.f. of the exponential distribution 


1 
1=2/(pa) 
with the parameter pa. 
We see that the m.g.f. (3.1.16) coincides with the m.g.f. of the I -distribution with param- 
eters pa,v; so we have arrived to a rather tractable distribution. For v = 1, we deal with the 
exponential r.v. with parameter pa. 


Now, consider the second version of the negative binomial distribution with the same 
parameters: p and v; see (0.3.1.15). This case presupposes that N may take on zero value 
with a probability of p. 


So (see again Section 0.4.3.2), My(z) = ( 


v 
p ) , and hence, 
1 — qe 


Lg (aca) Al ero eee 


3. The Distribution of the Aggregate Claim 223 


It is straightforward to verify that the expression in the parentheses above may be rewrit- 


ten as p+ Se Then (3.1.17) may be rewritten as 
1—z/(ap) 
Ms(z) = (M(z))”, (3.1.18) 
where i 
M(z)=p+q ; (3.1.19) 
ASPT) 


The function is the m.g.f. of the exponential distribution with parameter ap. Let 


1 
1—z/(a 
a (non-random) variable Y =0. The m.g.f. My(z) = 1. We see that M(z) is the mixture 
of two m.g.f. s: My (z) and the exponential m.g.f. with parameter ap. Denote by F(x) the 
d.f. corresponding to M(z), by F,(x) the d.f. of the exponential distribution with parameter 
a, and by E(x) the d.f. of Y. (That is, E(x) = 1 for x > 0 and = 0 for x < 0.) Then from 


(3.1.19) it follows that 
F(x) F (x) = pE (x) +qFap(x), (3.1.20) 


the mixture of the corresponding distributions. The graph is 
given in Fig.3. 

Let first v = 1. In this case, Ms(z) = M (z), and the r.v. S 
has the distribution F. Let us denote by Z, an exponential 
r.v. with parameter a. Then, in view of (3.1.20), we can write 
that the aggregate claim 


FIGURE 3. 


oa { 0 with probability p, (3.1.21) 


Zap With probability q. 


The case S = 0 corresponds to the case when N = 0, and this event indeed occurs with 
probability p. With probability q, the r.v. S is exponential. From (3.1.21) it follows that 
E{S} = p-0+q(1/pa) = q/(pa), which certainly coincides with what we would obtain 
using (1.1) or (3.1.7). 

Let, say, p = 0.01, a= 1, and v = 1. Then E{N} =q/p = 99 and S is the sum of a large 
(random) number of exponential r.v.’s each of which has a mean of one. Nevertheless, S has 
a simple structure; namely, with probability 0.01 it is equal to zero, and with probability 
0.99 it may be viewed as just one (!) exponential r.v. with a mean of p~! = 100. 

If v Æ 1, the distribution of S is more complicated. However, if v is an integer, in view 
of (3.1.18), S may be represented as the sum Y; + ... + Yy, where Y’s are iid. r.v.’s with 
the above distribution F. The d.f. of this sum is the convolution F*’(x). If v is small, 
calculating such a simple distribution is an easy task. 

In conclusion, it is again worth noting that all constructions in Section 3.1 concern partic- 
ular “good” distributions. In general, and sometimes even for these distributions, a simula- 
tion of the flow of claims—instead of direct calculations—may prove to be more efficient. 


3.2 The case of several homogeneous groups 


Now, we consider a portfolio consisting of / homogeneous groups of clients. First, we 
pay attention to the number of claims and the probability that a claim comes from a partic- 
ular group. After that, we turn to the claim sizes and the aggregate payment. 


224 3. A COLLECTIVE RISK MODEL 


3.2.1 The probability of coming from a particular group 


Let N; be the number of claims coming from the ith group, i = 1,...,/; N=N,+...+M™), 
the total number of claims. We assume N;’s to be independent and Poisson with respective 
parameters À, ..., A. 

Let À =A, +... +ù, and p; = à;/À. In Section 3.1.2, we have proved that given N, the 
joint distribution of N1, ..., N; is multivariate with parameters p1, ..., pi. 

We mentioned also that from this it follows that the probability that a particular claim 
comes from the ith group is p;. Let us clarify more carefully what it can mean. 

The model is static and we count all claims arrived during a fixed period. Then we can 
assume that they arrive simultaneously; say, at the end of the period. Consequently, it does 
not matter in which order we consider these claims, and we may view this order as arbitrary. 

For certainty, consider the first group. Assume we do know that N took on a particular 
value n #0, and N; took on a value k. Then the probability that a particular claim (from 
the n claims arrived) came from the first group, is k/n. This is true if we choose a claim at 
random from n claims or consider a specific claim, say, the fifth (provided n > 5). 

So, formally, if A is the event that a particular claim chosen came from the first group, 
then P(A |N; =k, N =n) = k/n. Then by the formula for total probability, for n > 1, 


n 1 n 
P(A|N =n) = } P(A|N, =k, N =n)P(N| =k|N =n) = DA kP(N; =k|N =n). 
k=0 k=0 


In Proposition 9 we have shown that the conditional distribution of N; given N = n is bino- 
mial with parameters (n, pı); see (3.1.12). The sum above is the mean of the distribution 
mentioned and, hence, equals np. Then 


1 
P(A|N =n) = „P= pi. 


Without loss of generality, we can also postulate that if there are no claims, then the prob- 
ability that a claim comes from the first group is also pı. In other words, let us set by 
convention P(A |N = 0) = pı. This a formal (and non-significant) assumption. Then 


P(A)= } P(A|N =n)P(N =n) =p, } P(N =n) = pr. 
n=0 n=0 
Certainly, the same concerns all other groups. 
Consider also a couple of examples based on the same Proposition 9. 


EXAMPLE 1. Let / = 3, Ay = 2 and Ay = 3, àz = 5. Assume that the total number of 
claims, N, took on the value 6. What is the probability that Nj; < 2 and Nz < 3? Since 
pı = 0.2, p2 = 0.3, and p3 = 0.5, in accordance with (3.1.11), 


2.3 6! ; ' as 
mm LIN i— j)! 


which may be calculated even by hand. 


3. The Distribution of the Aggregate Claim 225 


EXAMPLE 2. Let / = 3, A; = 100, Ax = 200, A3 = 500. Assume that the total number of 
claims N has assumed a particular value of 900. What is P(N; < 120 |N = 900)? According 
to (3.1.12), the conditional distribution of N; is binomial with parameters p = pı = 1/8 
and n = 900. Let X be a r.v. with this distribution. Of course, computing P(X < 120) 
exactly is meaningless but we can apply the normal approximation by using the central 
limit theorem. (In the case of the binomial distribution, it is called the Moivre-Laplace 
theorem). Since E{X} = $ -900 = 112.5 and Var{X} = $ - 2-900 ~ 98.44, we can write 
P(X < 120) © ®((120— 112.5) /V98.44) = (0.756) ~ 0.78. 


3.2.2 A general scheme and reduction to one group 


Now, together with N;’s and N, we adopt the following notation: 


Xij 1=1,...,1, j =1,2,..., is the size of the jth claim coming from the ith group; 
Fix) Weal, is the common d.f. of X;;; 

M;(z), i=1,...,l, is the common m.g.f. of X;;; 

Si = P Xij, i=1,...,1, the total of all the claims in the ith group; 

S= X} S; the total of all the claims. 


It makes sense to emphasize that for each group i, the r.v.’s X;; are identically distributed. 
We assume that all r.v.’s N;,i = 1,...,/, and Xjj,i = 1,...,1, j = 1,2,... are mutually inde- 
pendent. Then the distribution of S is given by 


Fs = Fs, *...*Fs,, (3.2.1) 


the convolution of the distributions of the aggregate claims for separate groups. If we 
manage to find separate Fs,, and if / is not large, then operation (3.2.1) may be numerically 
tractable. 


In the case where all r.v.’s N; have Poisson distributions, the above scheme may be sim- 
plified. For now, we will reason somewhat heuristically; at the end of the section we will 
provide a rigorous scheme. 

Set again A; = E{N;}, and À =A, +... +. So, N is Poisson with parameter À. 

Let us consider the portfolio as a whole and denote by Y, the size of the Ath claim arriving, 
whichever group it comes from. Let Bj, be the event that the kth claim comes from the ith 
group. We know that for any k, the probability P(Bj,) = pi, where p; = A; /X. Then the d.f. 
of Y, does not depend on k and is equal to the function 


Fy (x) = P(Y; <x) =) P(Y; < x| Bix) P(Bix) = 


1 i 


F(x) pi. (3.2.2) 


mM- 
M-~ 


i 1 


So, the Y’s are i.i.d. and the distribution of the Y’s is a mixture of the distribution F;. 
Eventually, we may unify the groups in one homogeneous group writing 


N 
= Ji Yz. 
k=0 


226 3. A COLLECTIVE RISK MODEL 
EXAMPLE 1. Let / = 3, Ay = 100, Ax = 200, à; = 500, and r.v.’s 


x= 1 with probability 1/3, ___ J 1 with probability 1/4, 
lj ~ | 2 with probability 2/3, “77 | 2 with probability 3/4, 
XK = 1 with probability 1/6, 
3/ ~ \ 2 with probability 5/6. 


Then A = 800, pı = a = , p2 = 4, and p3 = 3. The distribution of each Y; is the mixture 
of the distributions above. Therefore, Y, takes on values 1 and 2, and 


51 5 


Hence, 
__ J 1 with probability pı = 5/24, 
«~ | 2 with probability pz = 19/24. 


Thus, S = Yı + ...Yy where N is a Poisson r.v. with parameter A = 800. By (1.2) and (1.4), 


n 4300 
E{S} = E{Y}E{N} = — -800= Ean 1433.3..., 
Var{S} = E{Y7}E{N} = — -800 = 2700. 
The distribution of S is compound Poisson. Certainly, we cannot write this distribution 
in an explicit form but we can write its m.g.f. By (1.6), 


Ms(z) = exp {800 (My (z) —1)} = exp {800 (Fe ie -1)} 


24 
1 
= exp { Ve f O gz soo} 


In the case under consideration, we can proceed further using the construction of Section 
3.1.2. In accordance with the results of this section, 


S=N,+2No, 


where N; and N, are independent Poisson r.v.’s with parameters pÀ = x - 800 = 300 and 
Pù = a -800 = BW, respectively. The program for calculating such a distribution may be 
straightforward. 


EXAMPLE 2. Let / = 2, A; = 200, A2 = 300. Assume the r.v.’s X; j; and X2; are exponen- 
tially distributed with E{X;} = 2 and E{X2;} =3. Then A = 500, pı = TE =0.4, p2 = 0.6, 
and S = Y; +...Yy, where N is a Poisson r.v. with parameter à = 500, and the distribution 
of Y’s is the mixture of the exponential distributions above. The density 


fra) = pifi (x) + pf), 


3. The Distribution of the Aggregate Claim 227 


where fı and fù are the densities of r.v. X; j; and X2;, respectively. Thus, for x > 0, 
1 1 
fry (x) =0.4- ae +0.6: Fed 3 = 0.2(e7/? 4 e*/), (3.2.3) 


The collection of two groups above can be reduced to a homogeneous portfolio with the 
distribution of a particular claim given in (3.2.3). The m.g.f. of distribution (3.2.3) is the 
mixture of the m.g.f.’s of the above exponential distributions and equals 


1 1 1—2.4z 


M(z) =0.4- +0.6- = 
(2) 1—2z 1—3z (1—2z)(1—3z) 


The m.g.f. of the r.v. S is 
Ms(z) = exp{500(M(z) — 1)}. 


Calculating E{S}and Var{S} is easy and may be done either using (1.2) and (1.4), or 
directly as follows: 


E{s} = AE {Xj} +AQE{X2;} = 200-2 + 300-3 = 1300, 
Var{S} = M E{X?7;} +A.E{X3;} = 200-8 + 300-18 = 7000. 


EXAMPLE 3 ([153, N7]*). An insurance company pays claims at a Poisson rate of 2,000 
per year. Claims are divided into three categories: “minor”, “major”, and “severe”, with 
payment amounts of $1,000, $5,000, and $10,000, respectively. The proportion of “minor” 
claims is 50%. The total expected claim payments per year is $7,000,000. What is the 
proportion of “severe” claims? 

Denote by A;, i = 1,2,3, the Poisson rates above. Let A = A; +A2 + às and p; =A;/X. 
The term “proportion” concerns the probabilities p;. Let us choose $1000 as a monetary 
unit. Then the total expected payment is A; +542 + 10A3 = 7000. Dividing it by A = 2000, 
we have pı + 5p2+10p3 = 3.5. Together with pı = 0.5, and pı + p2 + p3 = 1, this leads 
to p3 = 0.1. 


In conclusion, we provide a formal scheme which justifies the above construction more 
rigorously and gives more insight into it. 
In view of (1.6), for all i=1,...,/, 


Ms,(z) = exp{Ai(Mi(z) — 1)}- (3.2.4) 


It is worthwhile to emphasize that M; is not the m. g.f. of the ith claim but the common m.g.f. 
of separate claims from the ith group. Since the S;’s are mutually independent, the m.g.f. 


l l l 
Ms(x) = JMs% = [exp timate) —1)} =exp frone = n . 


Reprinted with permission of the Casualty Actuarial Society. 


228 3. A COLLECTIVE RISK MODEL 


It is easy to verify that the last formula may be rewritten as 


ae 
Ms(x) = exp f (£ ~ mle) — D) | =exp{A(M(z)—1)}, 68.2.5) 
l 


where 


l 
Ài 
M(z) =} piMi(z), and pi= >. (3.2.6) 
i=1 


The m.g.f. in (3.2.6) is the mixture of the m.g.f.’s M; with weights p;, and hence the 
corresponding distribution F is the mixture of the distributions F;. So we have arrived at 
the distribution in (3.2.2). 

Function (3.2.5) is the m.g.f. of a compound Poisson distribution. More specifically, 
if W is a r.v. with the m.g.f. given by (3.2.5), then in accordance with (1.6), W may be 
represented (or viewed) as 

W=Y, +...+Yn, 


where N, Yj, Y2,... are mutually independent r.v.’s such that N is a Poisson r.v. with param- 
eter À and Y’s have the common distribution F from (3.2.2) and the m.g.f. M(z). 


4 PREMIUMS AND SOLVENCY. 
NORMAL APPROXIMATION 


4.1 Limit theorems 
4.1.1 The Poisson case 


In this section, we restrict ourselves to a homogeneous portfolio or more precisely, we 
assume the random addends X; in (0.1) are independent and identically distributed (i.i.d.). 
We know that if the number of addends N is fixed and large, then the distribution of the 
sum may be closely approximated by a normal distribution. The question is whether it is 
true if N is random but takes large values with large probabilities. For example, is a normal 
approximation possible if N is a Poisson r.v. with a large expected value? The answer to 
this particular question is “yes” and reflected in Theorem 11 below. However, as we will 
see, in the general case, a similar result is true under certain conditions; though mild and 
natural. We begin with the Poisson case. 

Let N = N; be a r.v. having the Poisson distribution with parameter A, and let 


Sa) =A +...+Xny- (4.1.1) 


We enclosed A by parentheses in order to distinguish this notation from S, = X1 +...+Xp. 
As already mentioned in Section 3.1.2, the distribution of Sq) is called compound Poisson. 
Set m = E{X;} and o? = Var{X;}. By (1.2) and (1.4), 


E{Sqy} =mÀ, Var{S o} = (0° +m?)À. 


4. Premiums and Solvency 229 


Consider the normalized r.v. 


Sa ~EtSat _ Say =m 


 WWarkSay} VO FAR 


As we know, E{S()} = 0, Var{S(,)} = 1. 


Sia) 


Theorem 11 For any x, 
P(SQ) Sx) > Phx), as À —> om, 


where, as usual, ®(x) is the standard normal d f. 


Route 1 => page 233 


4.1.2 The general case 


This subsection is devoted to the general case and particular examples different from the 
Poisson scheme. 

From now on, M, is an arbitrary integer-valued r.v. depending on a varying general pa- 
rameter à. We keep the same notation A in order that the reader who decided to restrict 
her/himself to the compound Poisson case, could easily read the next section concerning 
premium evaluation. 

Set n} = E{N,} and v? = Var{N, }. Most illustrations below will concern the following 
four examples. 


1. The case of bounded variance (BV): in this case, we assume that n, is unlimitedly 
growing while v} < d for some number d independent of A. 


Let, for instance, M, = À + K}, where A is an integer, and the “random component” 
K, is an integer valued r.v. such that —s < K, < s for some fixed number s. Assume 
E{K,} =0. Then m =A, v} = Var{ Ky} = E{K}} < 3. 


2. The Poisson distribution (PD) case: N, is Poisson with parameter A. Then m =v} =À. 


3. The negative binomial distribution (NBD) case: N, has the negative binomial distri- 
bution (2.2.5) with fixed parameter p and (increasing) parameter v = À. In this case, 
m, =A(1 — p)/p, vz = A(1 — p)/p’; see Section 0.3.1.4). 


4. The geometric distribution (GD) case: N} has the geometric distribution (2.2.7) with 

parameter p = 1/A. In this case, n} = (1 — p)/p =à- 1 and v} = (1 — p)/ pP = 
A(A — 1); see Section 0.3.1.3]. 
It is worthwhile to note that the fact that the geometric distribution is a particular 
instance of the negative binomial distribution should not mislead us. In Case 3, the 
parameter p of the negative binomial distribution (see, e.g., (2.2.5)) is fixed and the 
parameter v = A varies while in Case 4, the varying parameter is p and v = 1. 


230 3. A COLLECTIVE RISK MODEL 


Let us return to the general case. Let S), m, and © be defined as above. By (1.1) and 
(1.3), 


E{Say} = mn), Var{S (a) } = on, + mv. (4.1.2) 


For brevity, let us set d? = 0n, + mv? and consider the normalized r.v. 


~ _ Sa@aEt{Say} _ Stay mma, 


SQ) = = 
j 4/ Var{ So} d, 


Our goal is to establish conditions of asymptotic normality of Say: As we will see, to 
accomplish this, we should also normalize the r.v. M) setting 


N-n, 

pa 

Ny = —. 
VA 


Clearly, E{N,;} = 0 and Var{N*ž} = 1. 


To make our exposition more explicit, we adopt the following notation. We say that a 
sequence of r.v.’s E, converges to a r.v. & in distribution as à —> œ, and write it as 


&, SE, (4.1.3) 


if the distributions of the r.v.’s &} converge to the distribution of €. In other words, the 
d.f’s Fe, (x) + Fe(x) as A — œ. (The reader may look up the notion of convergence in 
distribution in Section 0.5 for more detail.) 
Under notation (4.1.3), the classical Central Limit Theorem (CLT) may be stated as 
follows. For the normalized sum 
st = Sn — ae 
oyn 


where n is a non-random integer 


S4Z as n> o, (4.1.4) 


where Z is a standard normal r.v. (See, e.g., Section 0.6.2.) 


The theorem below may be interpreted in the following way. For S$} to be asymptotically 
normal, not only the r.v.’s S, should be asymptotically normal for large n, but the same 
property should hold for M, for large A. 

For instance, this is true if M, is a Poisson r.v. since Ny may be presented as a sum of 
independent Poisson r.v.’s (see also a remark in p.164 and Exercise 2.43). 

In the theorem stated below, all limits are those as A — œ. Let, as above, Z denote a 
standard normal r.v. 


4. Premiums and Solvency 231 


Theorem 12 Suppose 


n}, > o, (4.1.5) 
and either 
A Ss (4.1.6) 
ym 
or v 
A sesi (4.1.7) 
Vin 
and in addition, 
Nee (4.1.8) 
Then 
Sy SZ as No. (4.1.9) 


A proof of Theorem 12 is given in Section 4.4. Let us consider examples and discuss 
conditions (4.1.6)-(4.1.8). 


EXAMPLE 1. In the BV case, v} < d and (4.1.6) holds if (4.1.5) is true. 


Certainly, v} does not have to be bounded for (4.1.6) to be true: it suffices to assume 
vy = 0(n). (4.1.10) 
(for the definition of o(-), see Appendix, Section 4.1). 
Thus, we can state 


Corollary 13 Normal convergence (4.1.9) is true if (4.1.5) and (4.1.10) are satisfied. 


EXAMPLE 2. Let 
N, =A+Ky, 


where A is an integer, K, takes on integer values, and 


Then n, > A—A!/4, v2 < E{K?} < VA, and (vi/m) < [VA/ (A — 11/4] = 0. 


However, the case (4.1.10) should not be considered the most important since in many 
applications the variance v? has the same order as the mean n}. 

The classical example is, of course, the Poisson distribution for which v? = m, = À. So, 
(v/v) = 1 and c in (4.1.7) equals 1. To apply Theorem 12 in this case, we need to 
verify condition (4.1.8). 

The validity of (4.1.8) has been shown previously at the end of Section 2.2.1.2 (p.164) as 
well as in Exercise 2-43 (with a detailed advice). Thus, we have come to 


Corollary 14 In the PD case, (4.1.9) is true (which, certainly, is equivalent to Theo- 
rem 11). 


Next, we consider the NBD case; i.e., Case 3 above. Now (v? /n}) = 1/p, soc = 1/,/D, 
and we should again verify (4.1.8). For simplicity, we restrict ourselves to the case when A 


232 3. A COLLECTIVE RISK MODEL 


is an integer. In view of (2.2.4), see also Exercise 8c, the r.v. M, in this case may be rep- 
resented as the sum Y; + ... + Y, where Y’s are i.i.d. r.v.’s having the geometric distribution 
with parameter p. Hence, by the CLT, N% 4, Z and (4.1.8) is true. 

> To verify (4.1.8) for the instance when A is an arbitrary real number, we can write the 


m.g.f. (2.2.4) as 
p à p [A] p A—[A] 
mao = (r) 7 fe (a) l 


where [A] is the integer part of à. Thus, we can represent M, as the sum Y; + ... + Yp] + Un., 
where the Y’s are the same as above and U, (the remainder, so to speak) is a r.v. independent 
of the Y’s and having the m.g.f. (p/(1 — qe) P- . 

The sum Y; +... + Yj is asymptotically normal by the CLT, while U; is negligible with 
respect to the total sum. Indeed, E{Y; +...+ Yj} = nyy = A](1 — p)/p and E{U,} = 
[(A —[A])(1 — p)/p] < (1 — p)/p (see Section 0.3.1.4). Then, (pj /E{U;}) — 09. The rest 
of the proof is carried out in accordance with the scheme of Exercises 2-43,44. < 

Thus, we have come to 


Corollary 15 In the NBD case, (4.1.9) is true. 


The reader should now have a grasp of the requirements of Theorem 12. The classical 
CLT implies that the sum S, with a non-random large number of addends is asymptotically 
normal. Theorem 12 establishes conditions under which this is true for a random number 
of addends. 

If the variance v? = o(m), the condition (4.1.6) obviously holds. 

If we deal with the case (4.1.7), then the normalized sum Sa) is asymptotically normal 
if Na is asymptotically normal. The latter, for example, is true if we can represent Ny as a 
sum of many i.i.d. addends plus perhaps a negligible remainder. As we saw, it is true, for 
example, in the PD and NBD cases. The instance where M, has a binomial distribution is 
explored in Exercise 37. 

Consider now an example more sophisticated than the binomial case. 


EXAMPLE 3. Let M, have the Poisson-Poisson distribution from Example 2.2.2-3 with 
i, from this example equal to À. For the other parameter, we will keep the same notation 
A». Thus, 


N, =Y; +... +Yķ, (4.1.11) 


where K is a Poisson r.v. with parameter A, and Y’s are independent r.v.’s having the Poisson 
distribution with parameter Ap. First of all, as has been computed in Example 2.2.2-3, n} = 
E{Nj} = Ag and v? = Var{ N} } = A(Az +15). Hence, (4.1.7) is true with c = v1 +A2. 
To verify (4.1.8), we should just observe that scheme (4.1.11) is a particular case of the 
general scheme (0.1). The role of N is played by K and the role of S by Ny. Since K is a 
Poisson r.v., (4.1.8) follows in this case from Corollary 14. Thus, (4.1.9) is true. 


Now we consider a situation where the conditions of Theorem 12 are not satisfied and its 
conclusion is false. 


4. Premiums and Solvency 233 


EXAMPLE 4. Consider the GD case. We have computed that in this case, n}, = A — 
1, vy =A(A—1), so (v3 /m,) =A — œ and hence (4.1.7) is not true. Note that the standard 
deviation v} = ,/A(A— 1) and has the same order as the mean value n = A—1. This 
means, in particular, that although A is growing, N} can assume small values with a non- 
negligible probability and, hence, with a non-negligible probability the sum S is not large. 
Consequently, we should not expect § to be asymptotically normal. 

As a matter of fact, the following interesting result is true. 


Theorem 16 (Rényi). If N}, has the geometric distribution with parameter p = 1/A, 
then 


1 d 
mo) 56 as À — ©, 


where & is a standard exponential r.v.; that is, & is exponential and E46} = 1. 


We omit the proof of this theorem which may be carried out with use of m.g.f.’s (see, 
e.g., [12], [111]). 


4.2 Estimation of premiums 


Formally, our model does not involve premiums since N counts claims coming from the 
portfolio as a whole rather than from clients who pay premiums. We can, however, talk 
about an amount of money c = cy, sufficient to cover claims with a given probability f; i.e., 
the amount c for which P(S) < c) > B. We may view c as an aggregate premium and define 
the loading coefficient O by the relation c = (1+ 8)E{Sj)}. If normal approximation 
works in our situation, then we can apply it to the determination of the minimal acceptable 
8, following the scheme of Section 2.3.1.1. In particular, the counterpart of (2.3.1.5) will 


be 
aps\/Var{ Si} 
PR fo LE A (4.2.1) 


E{Sw} 


The derivation is absolutely similar to what we did in Section 2.3.1.1. 

In view of (1.2)-(1.4), in the case when N is Poisson with parameter A, the last formula 
may be rewritten as 
ABs V mÀ 


ae me” 


where m2 = E {X7}, the second moment of the X’s. 
Thus, in the compound Poisson case, 


2 2 
ne ps2 _ qpsVm? + 0? ee ee (4.2.2) 


and k = o/m, the coefficient of variation of r.v.’s X. All three representations in (4.2.2) may 
be useful. 


234 3. A COLLECTIVE RISK MODEL 


EXAMPLE 1. Consider the case when X’s are lognormal and N is Poisson with param- 
eter A. We will see that in this case, to estimate 8, we should know A, the security level B 
and Var{1n(X;)}. 

Log-normality means that each X; = exp{a+ bn jo}, where a and b are parameters and 
1 jo’ are independent standard normal r.v.’s [see also (2.1.1.13)]. Because X; = ef exp{bn jo}, 
if we multiply all X’s by e~“, then we will switch to the simpler r.v.’s exp{bn jo} but k and 
hence the representation (4.2.2) will not change. So, without loss of generality we can set 


a=0. In this case, m = e? l2, m = er [see (2.1.1.14), (2.1.1.15)] and hence MNS, ed 12, 
m 


Now, let us observe that In(X;) = bn jo. Since Var{n jo} = 1, we have b? = Var{In(X;)}. 
So, the coefficient ,/m2/m depends only on Var{In(X;) }. 
Thus, 


Bs b2/2 
0x =e" T, 4.2.3 
T so 


1.282 
For example, if B = 0.9, b? = 0.2, and A = 400, then the loading coefficient 6 ~ el y 


0.071. 
If we want to estimate the premium c itself, then a should be involved. The mean aggre- 
gate claim is given by E{Sa)} = Àm = eat? /2 and 


c=(1+ O)retth/2 = ettb?/2 (A+ ags Vav 1 +) ; 


It is worth noting that the second term is much smaller than the first for large À. 


Route 1 => page 243 


EXAMPLE 2. Let X’s be uniformly distributed on [0,a], and let N be negative binomial 
with parameters p = 1/2 and v = À. For the same reason as in Example 1, the coefficient 
6 does not depend on a, so we can set a = 1. (If a# 1, we may divide all X’s by a, which 
will not change 0.) In this case, E{X} = 1/2 and Var{X7} = 1/12. In view of (4.1.2), 


E{Sa} = > = E 
Var{S(ay} = (1/12)m, + (1/4)v; = a z (3 ” >) j 


Now it is easy to calculate that 


aM Bs 3+ p 


~ vay 30 =p) 


For B = 0.8, we would have 0 ~ 1.286/VA. 


4. Premiums and Solvency 235 


4.3 The accuracy of normal approximation 


We restrict ourselves to the compound Poisson distribution with N being a Poisson r.v. 
with parameter À. 

It is remarkable that the rate of convergence to the normal distribution in this case is prac- 
tically the same as in the classical case of a fixed number of addends which we considered 
in Theorem 2.3.2-6. 

Let 

„1 FAP} 
VI EKR 
(The r.-h.s. of (4.3.1) does not depend on i since the X;’s are identically distributed.) The 
only difference between (4.3.1) and the Lyapunov fraction in (2.3.2.1) from Chapter 2 is 
that (4.3.1) involves non-central moments. Nevertheless, as in the case of Chapter 2, L} is 
dimensionless and not sensitive to the change of scale (see details in Chapter 2). 


(4.3.1) 


Theorem 17 Let F% (x) be the d.f. of the rv. Sin): Then there exists an absolute constant 
Cı such that for any x, 
|F; (x) — B(x) | < CLr- (4.3.2) 


With regard to the constant C}, the last results show that > 
Cı < 0.792. (4.3.3) 


Similar to the exposition in Section 2.3.2, we can write a bound for the loading coefficient 
0, that is, the counterpart of (2.3.2.7): 


aptas / Var{ Scr} 
ey “i (4.3.4) 


E E{Sa} 


where Ay, is the r.-h.s. of (4.3.2). Let mz = E{X7}. Since in our case E{S()} = mA and 
Var{S(,)} = mÀ, inequality (4.3.4) may be rewritten as 


qB+A,,s y 2 
mV 
EXAMPLE 1. Consider the situation of Example 4.2-1. Again we can set a = 0 because 
the ratio ,/mz/m does not depend on a. As we have computed before, (,/m2/m) = ef 12, 
Now, E{X}} = E{e*™} = My» (3b) = e°'/2 (see also Section 2.1.1.3) and because 


2 
mz = e” , we have 


0> (4.3.5) 


1 e2 


E — b /2 
VEE 

If as in Example 4.2-1, b? = Var{ln(X;)} = 0.2, then Ay ~ 0.792- 1.3505 x 1.069-75. 
5The constant was obtained in [92] and also in [77]. Without computing the constant, the bound (4.3.2) was 
obtained earlier in [118]. A detailed discussion may be found in [12]. 


236 3. A COLLECTIVE RISK MODEL 


Set B = 0.9, A = 400. Then A, =~ 0.053, and gga, 5 1.675, which results in the bound 


1.676e°:! 
> ee 


3 20 


~ 0.093. 


The rough approximation in Example 4.2-1 was equal to 0.071. The reader is recom- 
mended to look up the important remark in Example 2. 3.2-1. 


4.4 Proof of Theorem 12 


We prove the theorem proceeding from conditions (4.1.5), (4.1.7), and (4.1.8). The case 
(4.1.6) is easier, and to run a proof for this case, it suffices to set in the proof below c = 0. 
First, we will show that under conditions of the theorem, 


(N/d?) 5 k =1/(0? + mc’). (4.4.1) 


(For the definition of convergence in probability, see Section 0.5.) Indeed, 


Ny VANAT | MO Vaya | MA 
PL y P gat pe (4.4.2) 
a aA A à à a 
By conditions (4.1.5)-(4.1.7), 
1 v n 
Yh Vy >= : Wal An) 0: : l = =0, (4.4.3) 
d? œm tm, yn 0? +m (v/m) o4 + mec 
Ny, Ny 1 1 
= = > =k. 4.4.4 
do On tmv, © +m (vn) © + mc? PS 


Thus, since E{N;} = 0 and Var{N;} = 1, relation (4.4.3) implies that (v) /d?)N¥ 20, 
Together with (4.4.4) and (4.4.2), this implies (4.4.1). 
Next, we write 


1 lifer 1 gee 
Sa) = Hm —mn,) = A Lx —mn, | = rm La —m)+m(Ny — 1) 
1% m 
=LA en N (4.4.5) 
a, & ) a 1M) 


In view of (4.4.1), asymptotically Ny is growing as kdy. The idea of the rest of the proof 
is to replace M, in the first term of (4.4.5) by the non-random number f, = [kd?], the integer 
part of kd. (We cannot replace Ny exactly by kd? since the sum limit is an integer.) The 
second term in (4.4.5) will remain unchanged. This replacement leads to the r.v. 


1 & m 
Y; = Xi ! N; , 
Maa LI m) a! 1 — My) 


4. Premiums and Solvency 237 


and the error arising because of such a replacement is the r.v. 
1 N, th 
Ya = =~ | Yi (Xi—m) — )i(Xi—m) J. 
4 \i=l i=l 


We will prove the theorem if we show that Y,; is asymptotically standard normal and the 
error term Yy £, 0. We have 


OV, 1 mv, Nn, 
Yu = : YX —m)4 


dh ovh T d, vi 
OVA ox | PWA qy 
= vin + Rng, (4.4.6) 


where as usual the symbol S* denotes the normalized sum (S,, — mn) /o/n. 

The main point here is that, since S;, does not involve M}, the r.v.’s Sẹ and Ny are indepen- 
dent. The r.v. S} is asymptotically standard normal in view of (4.1.4), while (mvi /d).)NX 
is asymptotically normal due to condition (4.1.8), as we will show. Then the linear combi- 
nation of the terms in (4.4.6) will be asymptotically normal too. 

Consider all of this in greater detail. In view of (4.1.5), d) — œ. Using the symbolism 
dy, ~ by, if (a) /b}) > 1, we have 


oy, 2 [kj] ov kd, _ o (4.4.7) 
d, d), d, Voz Fme i 
On the other hand, 
UN EL E = goes (4.4.8) 
dy 4/ 02, + my? o? +m? (v? /n) vm, 


In view of (4.4.7) and (4.1.4), 


O4/t (0) 
vi., n Z, (4.4.9) 
dy, * Vo? + mc? 
where Z; is a standard normal r.v. 
In view of (4.4.8) and conditions (4.1.7) and (4.1.8), 
MV, œd m 
N.: cL, 
dy ® YP tm i 
where Z3 is a standard normal r.v. independent of Z,. 


Now, let us consider the sum of the r.-h. sides of (4.4.9) and (4.4.10). As is easy to 
compute, the variance of the sum mentioned is equal to one. So, the sum is standard normal 
and hence Y}ı £ z, 

Next, consider Y}2. Since E{¥,2} = 0 [check on your own using (1.1)], to show that 
Yaz Eis 0, it suffices to prove that Var{Y,2} > 0. Because E{Ya2} = 0, 


(4.4.10) 


Var{Yio} = E13} = È, E{YR IN, =n} P(N, =n). 
n=0 


238 3. A COLLECTIVE RISK MODEL 


Ifn> th, 


2 
EAYRIM=n}= GE (%-m] = jar) b-n} = pano 


x 


Ifn <t, 


i=n 


uN 2 h 
E{¥y5|N, =n} = ze (Ze-m) = zval Beam = goa) >0. 


Thus, for any n, 
1 
E{Yp|M. =n} = —0"|n—-h|. 
di 


Hence, 


o2 o0 o2 
Varia} = z È n- hl|P(M, =n) = SELIM TE E{(N, —t)} 
di 
= 07 (E{(N,—4)?/as)'?. (4.4.11) 


(We used the fact that for any r.v. X, it is true that (E{|X|})? < E{X7} (for example, because 
0 < Var{ |X|} = E{X?} — (E{IX|})2.) 

Next, we will use the identity E{(X —a)?} = Var{X} + (a—E{X})”. (If the reader does 
not remember this identity, she/he is invited to prove it by direct calculations.) Thus, 


E{(N,—ty)?} =Var{Ny} + (m-th)? =v + (mh). 


Hence, 


E{(M—f)?} _ Vat (ma [kez]? (=) l (3 if) (4.4.12) 


dh, dh, d g dy 


Since ( [ka}] / d;) — k, in view of (4.4.4) and (4.4.3), the quantity (4.4.12) vanishes and 
hence Var{ Y2} — 0 as À — œ. W 


5 EXERCISES 


Section I 


1. Show that (1.1) and (1.3) follow from (1.5). (Advice: Look up (0.4.4.5).) 


2. If Sn = X1 4+...+Xn, where the X’s are i.i.d., then as we know (see Section 0.4.1), Ms, (z) = 
(Mx (z))”, where My (z) is the m.g.f. of X;. Show that Proposition 3 includes this case. (Hint: 
Write the m.g.f. of the non-random variable N = n.) 


Section 2 


3. 


8.* 


5. Exercises 239 


Making use of the Taylor expansion for e* [see Appendix, (4.2.5)], show that the sum of 
probabilities in (2.1.1) is indeed one. Prove (2.1.2). 


. A husband and wife each can purchase insurance for which the payoff for the first claim is 


much higher than the others in the same year. In the husband’s and the wife’s cases consid- 
ered separately, the numbers of claims are independent Poisson r.v.’s with the same A. The 
couple has also an option to buy a joint insurance where the number of claims with priority is 
two. Find the distribution of the total number of the claims with priority covered for the case 
of the two separate insurances and for the case of the joint insurance. If the premium for the 
joint insurance is double the premium of the individual policy, what decision should the cou- 
ple make? Show that the answer to the last question is the same for an arbitrary distribution 
of the number of claims (not only Poisson). 


. Provide an Excel worksheet as in Fig.1 from Section 2.1. Consider various n and p; e.g., let 


A = 2 and consider n = 10, 20, 40, 100. Estimate the accuracy of the Poisson approximation 
in each case and analyze the results. 


. A portfolio consists of three homogeneous groups. The number of clients in each group, n;i, 


and the probabilities of loss events for each group, rj, i = 1,2,3, are given in the following 
table: 


group | | group 2 | group 3 
ni| 231 124 347 
ri | 0.01 0.05 0.03 


(a) Find p, and the parameter A of the approximating Poisson distribution. 


(b) For the number of claims N, estimate P(N = 15) and P(N < 15). (For calculations, it 
makes sense to use software.) 


(c) If the data is annual data, how often, on the average, can you expect less than 16 claims 
a year? 


(a) Show that the classical Poisson theorem follows from Theorem 5. 


(b) In the scheme of Section 2.1, let J, = 1 or 0 each with probability 1/2, and for the 
remaining J;, j = 2,3,...,n, the probability pj = 1/(n—1). Set Na = Yi_yJj, the 
aggregate number of claims except for the first client. (i) Show that the distribution of 
N,, converges to the Poisson distribution with A = 1. (ii) Show that condition (2.1.6) 
is not true. (iii) Show that the limiting distribution for the whole N, is not a Poisson 
distribution. (Advice: Write the formula for total probability conditioning on 71.) (iv) 
Find the limiting distribution for N,,; that is, limy—,.0P(N, = k). (Advice: Consider 
P(N, = k| = 1), and P(N, = k| D =0).) (v) What will change if P(; = 1) equals 
some fixed p > 0? 

(a) Make sure that if r in (2.2.6) is an integer, then () = Q0 for n=r+ 1,... , while for 


a non-integer r, the coefficient () # 0 for any n and may be negative for n >r+1. 
Make sure that, nevertheless, the coefficient in (2.2.5) is positive. 


(b 


wm 


Show that for v = 1, the r.-h.s. of (2.2.4) is the m.g.f. of a geometric r.v., or more 
precisely, of an integer valued r.v. Kı such that P(K, =k) = pq*, where p = 1 — q, and 
k=0,1,... . Make sure that P(K, = k) coincides with (2.2.5) in this case. (As was noted 
in Section 0.3.1.3, the geometric distribution is defined in the literature in two ways: 
either as the distribution of a r.v. K’ assuming values k = 1,2,... with probabilities 


pq‘! or as the distribution of a r.v. K assuming values k = 0, 1,2, ... with probabilities 


240 


9. 


10.* 


11.* 


12.* 


14.* 


15.* 


16.* 


17.* 


3. A COLLECTIVE RISK MODEL 


pq‘. The former distribution is that of the first success in a sequence of independent 
trials. Clearly, the distribution of K is that of K’ — 1. [In Section 0.3.1.3 we denoted K’ 
by N but now N stands for the number of addends in (0.1).]) 


(c) Let now v in (2.2.4) and (2.2.5) be an arbitrary natural number. Show that in this 
case, K is the sum of v independent r.v.’s having the same distribution as K; above. 
(If, instead of the distribution of Kı, we consider the distribution of the r.v. N in the 
previous exercise, then K will be the distribution of the vth success in a sequence of 
independent trials.) 


In the scheme of Section 2.2.1, let A take on values A, and Ag, and events A = {N =n}, Bj = 
{A = ài}, i = 1,2. Write the Bayes formula for P(B;|A) and make sure that it is consistent 
with (2.2.8)-(2.2.9). 


In Example 2.2.1-5, find the expected value of A given N = 6. Compare it with the uncondi- 
tional mean E{A}. 


The number N of daily claims coming from a risk portfolio is a Poisson r.v. with random 
parameter A. This parameter has the I’-distribution with parameter v = 2 and a mean value 
of 3. The value of N on a particular day was 5. What can you say about the value of A 
which realized on this day. Using, for example, Excel, estimate the conditional probability 
that A > 3. 


Assume that the number of accidents arising in a risk portfolio is a Poisson r.v. with A = 300, 
and the size of damage in each accident is an exponential r.v. with a mean of 10. The sizes of 
damages corresponding to different accidents are independent. Let each insurance contract 
involve a deductible of 2. 


(a) Write the expected value and the variance for the number of claims. (b) Write the formula 
for the probability that the number of claims will not exceed 230. Do not compute it but tell 
whether you expect this probability to be very small. (c) Estimate this probability without 
long calculations. (Advice: Use the fact that the Poisson distribution with a large parameter 
is asymptotically normal: see Exercise 2-43.) 


Assume that the number of accidents arising in a risk portfolio is negative binomial with 
parameters pı = 1/3 and v = 4, and the size of damage in each accident has the Pareto dis- 
tribution in the form (2.1.1.17) with & = 2. The sizes of damages corresponding to different 
accidents are independent. Let each insurance contract involve a deductible of 2. (a) Write 
the expected value and the variance for the number of claims. (b) Write the formula for the 
probability that the number of claims will not exceed 4. Do not compute it—at least it is not 
required—but tell whether you do or do not expect this probability to be small. 


Assume that the number of traffic accidents corresponding to a risk portfolio is a Poisson r.v. 
with A = 300, and the probability that a separate accident causes serious injuries is p = 0.07. 
The outcomes of different accidents are independent. Find the probability that the number of 
accidents with serious injuries will exceed 10. 


Consider Example 2.2.2-2. (a) Does N assume all non-negative integer values? Find the 
probabilities that N = 0,1,2,3. (b) Verify that computing E{N} and Var{N} using (1.2) and 
(1.4) leads to the answers obtained in this example. 


Compute the expected value and the variance for the Poisson-Poisson distribution from Ex- 
ample 2.2.2-3. 


Give a condition under which the sum of independent negative binomial r.v.’s is negative 
binomial. Interpret the result considering a sequence of independent trials. 


18.* 


19.* 
20.* 


21. 


22. 


23. 
24. 
25. 


26. 


27. 


28. 


29. 


30.* 


31.* 


32. 


5. Exercises 241 


Write the formulas for the expected value and the variance for the distribution from Example 
2.2.2-4 making use of (1.1) and (1.3). 


Verify the results in Table 2.2.3-1. 


Calculate zero-truncated and zero-modified probabilities for the geometric distribution. With 
what distribution do we deal in the former case? 


Section 3 


An employer buys an insurance for his employees. The mean number of claims coming from 
the whole group of employees is 50, the standard deviation is 10. The individual loss has a 
mean of 4 (units of money) and a variance of 2. The insurance company imposes a deductible 
of 50 for the whole portfolio. Assuming that the distribution of the aggregated loss is closely 
approximated by a I’-distribution, find the probability that the company will pay more than 
200 units. 


Compute E{S} and Var{S} for the following cases: (a) N has a Poisson distribution with 
E{N} = 150 and the X’s take on values 2,3,4 with probabilities 1/3, 1/2, 1/6, respectively. 
(b) N has a geometric distribution with parameter p = 0.02 and the d.f. of the X’s is D(x;3,5). 
(c)N has the negative binomial distribution with parameters p = 0.02, v = 4, and the X’s 
are exponential with E{X} = 3. 


Verify (3.1.7). 
In the situation of Example 3.1.1-2, what should E{N} be equal to for P(N > 50) < 0.05? 


Following the scheme of Example 3.1.1-3, provide spreadsheets and particular calculations 
with parameters of your choice for the models of Examples 3.1.1-3,4,5. 


Following the scheme of Example 3.1.1-3, provide spreadsheets and particular calculations 
in the case when N is a Poisson r.v. with a mean of 20. 


In the scheme of Section 3.1.2, set l = 3, x; =i, pı = 1/2, p2 = 1/3, and A = 12. Compute 
(a) P(K\=3, K>=3,K3=2), P(K, < 3,K2 <3,K3 <2), P(K; =3,K2 =3|N=8); (b) P(S = 
0), P(S = 3). 


Let Nj and M2 be independent Poisson r.v.’s, E{N|} = E{N2}. Show that N; — M2 4 Yı + 
... + Yy, where N = N; + No, and the Y’s are mutually independent and independent of N r.v.’s 
assuming values +1 with equal probabilities. Write the m.g.f. of Nj — N2. (Advice: Consider 
two types of claims of sizes 1 and —1, respectively. The fact that the claim is negative may 
be interpreted as if you are paid instead of paying. Denote by N;, i = 1,2, the number of 
claims of type i.) 


Let N; and M2 be independent Poisson r.v.’s, E{N;} = 100 and E{N2} = 200. Similar to 
Exercise 28, represent the r.v. 3N; — 5M7 as a sum Y; + ... + Yy where N is a Poisson r.v. and 
Y’s are some r.v.’s that are mutually independent and independent of N. 


In the scheme of Section 3.1.3, for v = 2, a = 10, and p = 0.1, write P(S = 0) and the density 
fs(x) for x > 0. 


For a portfolio, the number of loss events has the negative binomial distribution with pa- 
rameters p = 1/3 and v = 10. The losses are independent and have the Pareto distribution 
(2.1.1.18) with 8 = 3, œ = 2. The insurance has an ordinary deductible of 1. Find the ex- 
pected aggregate payment. 


1 
Let a rv. S have the m.g.f. Ms(z) = exp 200 i 1 for z < 1/2. (a) Which 
— 2z 


insurance model lead to this formula? (b) Without any calculations, write E{S} and Var{S}. 


242 3. A COLLECTIVE RISK MODEL 


33. An insurance portfolio consists of two homogeneous groups of clients. Nj, i = 1,2, denotes 
the number of claims coming from the ith group during a fixed time period. Assume that 
r.v.’s N; are independent and have Poisson distributions, E{N;} = 102. 


(a) If N is the total number of all claims in the portfolio, what distribution does N have? 
Compute E{N}, Var{N}. Write the expression for P{N < 50}. 


(b) Estimate P(N; < 11|N = 50). 


(c) Let the amount of an individual claim in the first group always be $100, and in the 
second group let it be $300. What is the distribution of S, the total amount of aggregated 
claims, in this case? (Include a name and describe the distribution.) 


(d) Find E{S} and Var{S}. Write Ms(z). 


wm 


34. In the situation of Exercise 33, let the values of individual claims be random. More specif- 
ically, assume that an individual claim in the first group assumes values 100 and 200 with 
probabilities 1/3 and 2/3 respectively, while in the second group, these values are 200 and 300 
with probabilities 1/2. (a) What is the distribution of S? (b) Find E{S} and Var{S}. Write 


Ms(z). (c) Show that S a 100N, + 200N2 + 300N3, where Ni, N2, and N3 are independent 
Poisson r.v.’s. Find E{N;},i = 1,2,3. 


35. Solve Exercises 34 and 34 for the instance where a separate claim in the first group is uni- 
formly distributed on [100,200], while in the second group, it is uniformly distributed on 
[200,300]. (Advice: Observe that the distribution of a separate “imagined” claim Y is a mix- 
ture of two uniform distributions. The intervals [100,200) and [200,300] do not intersect, 
so it is not difficult to write the distribution function (or the density, whatsoever) for the first 
group and for the second, and write or graph the mixture with appropriate weights.) 


36. Solve Problem 35 for the case when the uniform distributions mentioned in this problem are 
those on [100,300] and on [200,400], respectively. 


Section 4 


37.* Apply Theorem 12 in the instance when A = 1,2,..., and Ny is a (p,A)-binomial r.v., that is, 
P(N, =k) = (*) pk —p)*, k=0,1,...,A. What are we dealing with in the case p = 1? 
(Advice: Use the CLT theorem.) 


38. Assume that in the scheme of Section 4.2, the X’s are exponentially distributed. (a) Does 
the estimate of the relative loading coefficient © depend on the parameter of the exponential 
distribution mentioned? (b) Find the estimate for O for the case when N is (i) a Poisson r.v., 
(1i)* a Poisson-Poisson r.v. (iii)* a negative binomial r.v. 


39. Find 8 and c in Examples 4.2-1 (a) for the case when the insurer pays only 80% of each 
claim; (b) when the distribution of X’s is exponential; (c) when the distribution of the X’s is 
uniform. 


40.* Find @ and c in Examples 4.2-2 (a) for the case when the insurer pays only 80% of each 
claim; (b) when the distribution of X’s is exponential; (c) when the distribution of the X’s is 
lognormal. 


41.* Similar to Example 4.3-1, find a precise bound for O in the case when the X’s are exponen- 
tially distributed. 


Chapter 4 


Random Processes and their Applications. 
I: Counting and Compound Processes. 
Markov Chains. 

Modeling Claim and Cash Flows 


This chapter has two goals—to list some general facts from the theory of random processes, 
and to consider in detail various models of cash flows, including the surplus and claims 
processes. 

We start with an overview and after that we will explore particular processes important 
for insurance modeling in greater detail. 


1 A GENERAL FRAMEWORK 
AND TYPICAL SITUATIONS 


1.1 Preliminaries 


A random variable was defined in Chapter 0 as a function X (œ) on a space Q = {@} of 
elementary outcomes œ. Usually, we omit the argument œ and write X instead of X (œ). 

A random or stochastic process is defined as a collection of r.v.’s X;(@) where t is a 
running parameter. We can view X;(@) as a function of two arguments: œ and t. 

For a fixed t, the function X;(@) as a function of œ is a random variable. 

For a fixed œ, we have a function of t. In this case, given œ, the function X; = X;(@) is 
also called a (particular) realization, or a trajectory, of the random process. 

EXAMPLE 1. Suppose that the sample space Q consists of only two elementary out- 
comes: @ and @. Say, we toss a coin. Assume that X;(@,) = t while X,;(@2) = 17. So, 
depending on which œ occurs, we deal with either a linear function or a parabola. 

In the case of a regular coin, it amounts to a random selection, with equal probabilities, 
of one out of two functions (rather than numbers): either t or t°. 


As we did it when dealing with r.v.’s, usually we again omit the argument @ and just 
write X;. 

In practically all models of this book, t is a time parameter and X; determines the evolu- 
tion of some characteristic in time. In general, t may of any nature. For instance, t could 
represent the distance from the beginning of a trench dug by a gold miner to a particular 
place in the trench. Then X, can represent the (random) concentration of gold at point t. 


243 


244 4. RANDOM PROCESSES AND THEIR APPLICATIONS I 


FIGURE 1. 


When the quantity X; is not a r.v. but a random vector (r.vec.), we talk about a random 
vector process. For example, X; may be the (random) vector of stock prices for various 
assets in a financial market at time t. 


Formally, the parameter t may take on values from an arbitrary set. When ¢ is time, we 
consider two cases: the continuous time case when f takes values from an interval, and 
the discrete time case when t = 0,1,2,... . In the latter case, {X;} is a sequence of r.v.’s 
{Xo,X1,X2,...}. 

One of the simplest processes in continuous time is a counting process N,, t > 0, repre- 
senting the total number of events of a certain type that have occurred prior to or at time t. 
For example, N; may be the number of customers who have entered a store by time t. The 
same may concern the number of falling stars you observed in the sky, the number of cos- 
mic particles registered by some device, etc. For us, the most important case is the process 
that counts claims being received by an insurance company. 


For brevity, we call occurrence of the events above arrivals, and time-intervals between 
consecutive arrivals interarrival times. These are also sometimes referred to as sojourn 
times. 

A typical realization of a counting process is shown in Fig.1. The symbols 7; there denote 
interarrival times. For t < Tı (no arrivals have occurred), N, = 0. If Ti < t < T1 +72 (after 
the first arrival and before the second), N; = 1, and so on. 


Note that the realization graphed in Fig.1 is continuous from the right. This reflects the 
fact that, when defining N, as the number of arrivals during the interval (0,r], we include 
the end point t. So, if an arrival occurs at the last moment r in the interval (0,1], we count 
it. 

Another important process when considering insurance models, is a surplus process 
(sometimes called also a reserve process). Such a process, R;, measures the monetary 
surplus of a risk portfolio at time t. We may also view it as the amount of the (monetary) 
reserve fund corresponding to this portfolio. For example, if u is the initial surplus, S; is 
the amount paid by the company by time t, and c; is the premium paid to the company by 
the same time, then R, = u+c,—S,. The process $; is called a claim process. Usually, 
though not always, we set c; = (1 +0)E{S;}, where 0 is a loading coefficient. Later, in this 
chapter and Chapters 5-6, we will consider the process R; in detail. 


Next, we consider two particular but important classes of processes. For now, we do it 
briefly. Later we will return to these processes and explore them in more detail. 


1. General Framework 245 


1.2 Processes with independent increments 


For a half-open interval A = (t1,t2], we define the increment of a process X; as the r.v. 
Xa = Xn — X;,. For instance, for a counting process N,, the increment Na is the number of 
arrivals during the interval A. 

A process X;, t > 0, is called a process with independent increments if for any collection 
of disjoint intervals Aj, ..., Ax, the r.v.’s Xo, Xq,,...,Xa, are mutually independent. Note that 
since we included Xo into the definition, all increments of the process do not depend on the 
initial value Xo of the process. For example, X; — Xo does not depend on Xo. 

If time is discrete, for t = 1,2,... we can write 


Xı = Xo + (X1 — Xo) + (X2 —X1) +... + (X — X-1) 
= Xo + X01] +X 1,2] + HX G-14- (1.2.1) 


So, X; is a sum of independent r.v.’s. 


In the continuous time case, we can get a similar representation if for a natural n, we 
(k-1)t kt 


noon 


divide the interval [0,t] into intervals Ay = ( | ,k=1,...,n, and write 


X; = Xo +Xa +- H XA,- (1.2.2) 


The random terms (addends) in the r.-h.s. of (1.2.2) are independent. Nevertheless, con- 
tinuous time processes with independent increments are not as simple as one may expect 
proceeding from (1.2.2). This is illustrated by the following two classical models. 


1.2.1 The simplest counting process 


We call a counting process N, the simplest if all interarrival times are independent identi- 
cally distributed exponential r.v.’s. Later, when we show that this process is connected with 
the Poisson distribution, we will call it a Poisson process. 

Assume that we have been watching the process defined until time ¢. In particular, at 
time t we know how much time has elapsed since the last arrival. However, by virtue of the 
lack of memory property of the exponential distribution (see Section 0.3.2.2), the amount 
of time we must wait for the next arrival after time t does not depend on how long we have 
already been waiting. The process starts over as from the very beginning. So we know that 
the remaining time until the next arrival has an exponential distribution, and the parameter 
of this distribution is the same as for a separate interarrival time. This parameter is fixed 
since we assumed the interarrival times to be identically distributed. 

Furthermore, the next interarrival time also does not depend on what happened before 
time ¢ since interarrival times are independent. Thus, for any interval A = (¢,¢’], the incre- 
ment Na does not depend on the evolution of the process before t, and hence N; is a process 
with independent increments. We consider this process in detail in Section 2. 


1.2.2 Brownian motion 


In the case of continuous time, we call a process continuous if with probability one 
its realizations are continuous functions. A counting process (see, for example, Fig.1) is, 
clearly, not continuous. The next scheme, in a certain sense, is contrasting to the previous. 


246 4. RANDOM PROCESSES AND THEIR APPLICATIONS I 


Let w;,t > 0, be a continuous random process with independent increments such that 
wo = 0. Suppose that for any interval A, the increment wa is a normal r.v. with zero mean 
and variance |A|, where |A] is the length of A. Such a process is called the standard Wiener 
process or Brownian motion. Note that we do not derive but postulate independence of 
increments, and we skip a formal proof that such a process indeed exists, i.e., well defined. 
One can find such a proof in any advanced textbook on random processes, e.g., in [47] or 
[70]. In Example 1 below, we will discuss how to simulate trajectories of such a process. 

Since wọ = 0, we can write w, = w; — Wo = W(o,t}> the increment over (0,t]. Hence, w; is 
a normal r.v. with mean zero and variance t. In particular, w; is a standard normal r.v., and 
for this reason we use in the definition of w, the term “standard”. 

The process w; may be considered a counterpart of a standard normal r.v. in the theory 
of processes. Originally, the term Brownian motion was used for the motion of a particle 
totally immersed in a liquid. The name comes from the botanist R. Brown who first con- 
sidered this phenomenon. It proved that processes of the type a + bw;, where a,b were real 
numbers, were suitable for modeling such a physical motion. The rigorous theory was built 
mainly by A. Einstein and N. Wiener. Nowadays, Wiener processes, or Brownian motion, 
are widely used for modeling various phenomena of different nature, such as diffusion in 
Physics, certain processes in quantum mechanics, the evolution of stock prices, surplus 
processes in insurance, etc. 


EXAMPLE 1. To understand how realizations of w, look, let us choose a real number 
ô > 0 and consider points t = k6, where k = 0, 1, .... Let Ay = (tk—1,tk]. Then 


Wy = Wa, ++ FWA- (1.2.3) 


The r.v.’s wa, are independent normal r.v.’s with zero mean and variance ô because the 
length of each interval is 6. 

Set &z = Ks wa, for k = 1,2,.... Since we divided wa, by its standard deviation, the r.v.’s 
&, are standard normal (see also (0.2.6.3)) and independent because the intervals A; do not 
overlap. 

Because wa, = VdEK, from (1.2.3) it follows that 


wy = VO(E1 +... +). (1.2.4) 


The last representation gives a way to simulate a realization of Brownian motion. Let, 
say, 6 = 0.01 and k = 1,2, ..., 100, which means that we consider 100 equally spaced points 
in [0,1]. The first point tı = 0.01, the last point t100 = 1. 

In the Excel worksheet in Fig.2, Column A contains 100 standard normal numbers gen- 
erated by Excel (not all numbers are shown in the figure). They represent €’s. C1 contains 
the value of & = 0.01. In El the value C1 x A1 is given, which comes from the relation 
võ = w}. So, El corresponds to the value w; . 

The value in E2 =E1+A2*SQRT(C1), which comes from w, = wy, + VdE, and so on. 
The cell Ek equals E(k — 1)+Ak*SQRT(C1), which reflects the formula w, = w;,_, + VdEK. 
The graph of Column E is given in the first chart. 

Cell C2 and Column G correspond to the same simulation where 6 = 0.02, and we con- 
sider 50 points in the same interval. It is worth emphasizing that this not a part of the 


1. General Framework 247 


A B H l J M 
1 | -0.30023 0.01 -0.03002 -0.04246 
2 | -1.27768| | (8's PI 0.02 -0.15779 -0.22315 
3 | 0.244257 -0.13337 -0.18861 
4 | 1.276474 -0.00572 -0.00809 A realization of Brownian motion, [| 
5 | 1.19835 0.114117 0.161385 100 points, 5=0.01 [| 
6 | 1.733133 0.28743 0.406487 0.5 [| 
7 | -2.18359 0.069071 0.097681] > o (| 
8 | -0.23418 0.045653 0.064563 | @ al 
9 | 1.095023 0.155155 0.219423] | 0-5 4 
10| -1.0867 standard \ T 0.046485; | 0.06574 7 -1 n 
T Ta normals J f -0.02254| | -0.03187,, 4.5 | 
12 | -1.69043 -0.19158 -0.27093 ` : [| 
13 | -1.84691 -0.37627| | -0.53213 ume f 
14 | -0.97763 -0.47403 -0.67038 [| 
15| -0.77351 -0.55138 -0.77977 g 
16 | -2.11793 -0.76318 -1.07929 
17 | -0.56792 -0.81997 -1.15961 A realization of Brownian motion, 
18 | -0.40405 -0.86037 -1.21675 50 points, 8=0.02 
19 | 0.134853 -0.84689 -1.19768 0.5 
20 | -0.36549 -0.88344 -1.24937 gl 
21 | -0.32699 -0.91614 -1.29561| | 2 SO ag GREED 
22| -0.37024 -0.95316 -1.34797| | 4 05 
23 | 1.342642 -0.8189 -1.15809 | > -1 
24 | -0.08528 -0.82742 -1.17016 15 
25 | -0.18616 -0.84604 -1.19648 
26| -0.51321 -0.89736 -1.26906 time 
27 | 1.972212 -0.70014 -0.99015 
28 | 0.865673 -0.61357 -0.86772 
29 | 2.375655 -0.37601 -0.53175 
30 | -0.65491 -0.4415 -0.62437 


FIGURE 2. Realizations of Brownian motion on [0,1] with the step 
6 = 0.01 (100 points) and 0.02 (50 points). In the figure, 
Columns A, E, and G are truncated (not all numbers are shown). 


previous realization: we took the first 50 values in Column A, but assigned them to 50 
equally spaced points in [0,1]. For example, while in the first case the number in Cell E1, 
—0.03, is the value of the process at the point t = 0.01, in the second case the number in 
Cell G1, —0.0425, is the value of the process at t = 0.02. Thus, the second graph may be 
viewed as a new realization. 

We see that the second graph appears smoother, which is not surprising: taking only 50 
points in the same interval, we eliminate 50 independent fluctuations between the chosen 
points. 

The graphs in Fig.2 represent just two possible realizations of the process. If we regener- 
ate in Column A other normal numbers, the realizations will be different. Fig.3 represents 
twenty independent realizations with the step 6 = 0.01. 

The envelopes of the twenty curves in Fig.3 are consistent with the theory. The distri- 
bution of w; is normal with zero mean and variance ft, and hence the distribution of wp 
coincides with the distribution of \/tZ where Z is standard normal. Thus, w; grows as yt 
times a random factor that does not depend on t. One may say that w; has the order +,/1, 
which is reflected in Fig.3. See also Exercise 4. 


It may be shown that in the continuous time case, any process with independent incre- 
ments may be represented as a certain combination of the Wiener process and processes 
of the type bN;, where b is a number, and N; is the simplest (Poisson) process from the 
previous section; see, e.g., [45], [70]. Such combinations are called Lévy processes (after 
a French mathematician Paul Lévy). 


248 4. RANDOM PROCESSES AND THEIR APPLICATIONS I 


FIGURE 3. Twenty independent realizations of Brownian motion. 
We consider the Wiener process in more detail in Section 5.1. 


1.3 Markov processes 


Below, we use systematically the notation P(A |X) that means the probability of event A 
given the value of a r.v. (or a r.vec.) X. In particular, P(Y € B|X) is the probability that a 
r.v. (or ar.vec.) Y will assume a value from a set B, given the value of X. For more detail 
see Section 0.7, and in particular, p.57. 

For a process X;,t > 0, let the symbol X‘ denote the whole trajectory of the process until 
time t. If time is discrete, X* is the random sequence {X0,X1, ..., X; }; if time is continuous, 
X' is the (random) function X,, 0<u<t. 

We say that a process X;is a Markov process (after A.A. Markov who first considered 
such processes in the beginning of the twentieth century) if for any set B on the real line, 
and any t,s > 0, 

P(X,45 € B|X") = P(Xi15 € B|X;). (1.3.1) 


To understand this definition, suppose that t is the present time, and the past trajectory 
X' is known. The 1.-h.s. of (1.3.1) is the probability that the future value X,,, will be in 
a set B, given the whole history of the evolution of the process by time t. The Markov 
property (1.3.1) implies that this probability, as a matter of fact, depends not on the whole 
past evolution but only on the last (and present for us) value X,. This is indicated in the 
r.-h.s. of (1.3.1). 


We may say that in the Markov case, “given the present, the future does not depend on 
the past”. 


For discrete time this implies, in particular, that 
P(X41 € B|Xo, .-,X:—1,X1) = P(Xi41 € B|X), 


that is, the value of the process at the next step depends only on where the process is now. 

Any process with independent increments is a Markov process. Indeed, X;+; = X; + 
Xit +s) and X;4s depends only on X, and the increment Xg +s} which does not depend on 
values X, for u < t. The converse assertion is not true. 


1. General Framework 249 


EXAMPLE 1. Let X, =e”, where b is a number, and w; is the Wiener process. Such 
a process may at first glance look exotic, but as a matter fact, it may be used, for example, 
for modeling the evolution of stock prices. We consider it in more detail in Section 5.1.3. 
We have 


Xi+s = exp{bwr+s} = exp{bw; } exp{b(wi+s — wr) } = Xrexp{bwg ts}. (1.3.2) 


By the definition of w;, the increment wọ, +) does not depend on values of the process w, 
for u < t, and hence, on values X, for u < t. Thus, for any fixed t and s, given the value of 
the r.v. X;, the r.v. X;+s does not depend on X, for u < t. Consequently, the whole process 
is Markov. 

Now, we show that increments of this process are dependent. For simplicity, let b = 1. 
Consider intervals (0, 1] and (1,2]. Because wo = 0, the initial value Xp = 1. Then, in view 
of (1.3.2), 


X(0,1] =X — Xo =X] a 1, X12] =X) — Xj = Xj [exp{w12}} = 1]. 


The r.v. exp{wa,2} does not depend on X;. On the other hand, both r.v., X(o,1; and X(1 3), 
involve X;. Hence, the increments above are dependent. 
We consider a generalization of this example in Section 5.1.3. 


To understand the nature of Markov processes, consider the discrete time case and the 
following construction. Let &1,62, ... be independent r.v.’s. We define a process X,, t = 
0,1,... as follows. Xo is a given r.v.; as above, we call it an initial value. For t = 1,2,..., we 
define X; by the recurrence relation 


Xii = hy (Xi, G41), (1.3.3) 


where h,(x,z),t = 1,2,..., is a function of two variables. 

For example, let X, be the capital of a person at time t, and the value of the capital at the 
next time, i.e., X41 = (1 +6&,41)X;, where €,,) is a random interest over the period (t,t + 1]. 
If we assume &’s to be independent of the previous history of the process, we will come to 
the model (1.3.3) with h; (x,z) = (1 +z)x. 

In general, because €’s are independent, X;;,; depends only on the previous value of the 
process X, and the r.v. €,;; which does not depend on the present or the past values of the 
process. Clearly, the process X; so defined is Markov. 

It is important that, as a matter of fact, any discrete-time Markov process admits the 
representation (1.3.3), so (1.3.3) is not an example but, in a certain sense, another definition. 

The proof of this fact is constructive, and gives a way to simulate Markov processes. Let 
X; be a Markov process. Since X;;; depends only on X;, the joint distribution of all rv.’s 
X, may be completely determined by the conditional distributions of each X,+; given the 
previous value X;. Let us consider the conditional distribution function of X,+; given X;, 
namely the function 

Fi (z|x) = P(X <z|X =x). 


We use the following fact proved in Section 0.3.2.1. If F(z) is a distribution function, 
F~'(z) is its inverse, and & is uniformly distributed on [0,1], then the r.v. F~!(§) has the 
distribution function F (z). 


250 4. RANDOM PROCESSES AND THEIR APPLICATIONS I 


Let r.v.’s 1,62, ... be independent and uniformly distributed on (0, 1], and let the functions 
h,(x,z) from (1.3.3) be defined as follows: 


hy(x,2) = Fz (elx), 


where Fi (z|x) is the inverse of F,.1(z|x) with respect to z for a fixed x. Then, by virtue 
of the previously stated fact, the r.v. h(x, +1) has the distribution function F;+1 (z |x). 


Let us construct a process Y, by the recurrence relation 


Y1 = h (Y; , ér) 


with Yo having the same distribution as Xp. Note that the process Y, is not the original 
process X;, since the process Y, has been artificially constructed. However, both processes 
have identical probability distributions. 

It is noteworthy that we can always take €),&, ... as particular identically distributed 
r.v.’s; namely, uniform r.v.’s. 

The described construction gives a way for simulating Markov processes in discrete time. 


EXAMPLE 2. Consider a Markov process X, for which Xp = 1, and given X,, the distri- 
bution of X;+1 is the Pareto distribution (2.1.1.17) with a = X;. In this case, 


Fi41(z|x) =1-z™ forz> 1, 


and hence 
h(x,z) = Fy a 1”. 


So, we set Yo = 1, and 
Yar =(1-E41) 1. (1.3.4) 


A corresponding Excel worksheet is presented in Fig.4. Column A contains twenty val- 
ues of uniform €’s; time moments ¢ are in Column D; and twenty values of the process are in 
Column E. For example, in accordance with the recurrence formula (1.3.4), the command 
for Cell E3 is =(1-A2)* (-1/E2). 


In the following sections, we consider in detail particular types of processes important in 
actuarial modeling. 


2 POISSON AND OTHER COUNTING PROCESSES 
2.1 The homogeneous Poisson process 


We come back to the model of Section 1.2.1. Let T1,T2,... be consecutive interarrival 
times. Then T, = 7, +... + Tn is the time of the nth arrival. As in Section 1.2.1, we assume 
that the r.v.’s t are independent and exponentially distributed with the same parameter 
which we will denote by À. As before, let N, be the total number of arrivals by time t. 


2. Poisson and Other Counting Processes 251 


A | B [ci{id]T E— TFT G@ fT H Tot tT vy To kK Tot fT Mi TN] 
1 | 0.663625 0 1 
2 | 0.429548 1| 2.972872 
3 | 0.535569 2| 1.207819 
4 | 0.464614 3| 1.886988 
5 | 0.060457 4| 1.392488 
6 | 0.250343 5| 1.045802 ae 
7 | 0.868984 6| 1.317216 One realization 
8 | 0.586566 7| 4.678505 
9 | 0.799005 8| 1.207788 15 
10 | 0.309793 standard 9| 3.775156 
11 | 0.102023/{~ uniform \ |10| 1.103196 do f 
12| 0.944273||Ñ random ||11| 1.102461 
13| 0.682241|| | numbers /|12| 13.72136 Yı 2 
14] 0.410047 J (13| 1.087143 5 
15| 0.69396 14| 1.624844 EPE, b 
16 | 0.062929 15| 2.072407 0 
17 | 0.926206 16| 1.03186 
18 | 0.028657 17| 12.50343 9 2 19 15 20 
19 | 0.335459 18| 1.002328 t 
20 | 0.999695 19| 1.503371 
21 20| 217.9559 
22 
23 < 
24 tHY: 
25 
26 = 


FIGURE 4. Simulation of the Markov process from Example 2. 
The worksheet and the graph of one realization 


Now note that the number of arrivals N, is not less than n if and only if the nth arrival 
occurred prior to or at time t. Consequently, 


P(N, =n) =P( St), (2.1.1) 


and to find the distribution of N,, it suffices to find the distribution of T}. 

Each exponentially distributed t has the ’-density f,; in the notation of (2.1.1.10). By 
Proposition 2.4, the density of T, is the convolution fy; * fy; *...* fal = fan, the -density 
with parameters A, n. Again using (2.1.1.10), we arrive at the density of Tn: 


A” 


ap ° for x 2 0. (2.1.2) 


fn) = 


Hence, 


of = A” ee Cee Mt | Mt (ui) 
P<) = f fedas= Gy fs e “dx=1-e (1 Payee PEN ; 


(2.1.3) 
The last integral is standard and may be found in many Calculus textbooks. To check that 
(2.1.3) is true, one may take the derivative of its r.-h.s. and make sure that it equals fr, (t), 
that is, the derivative of the 1.-h.s. (When you differentiate, all terms will cancel except 
one.) At t = 0, the r.-h.s. and the |.-h.s. are both equal to zero. So, the r.-h.s. and the L.-h.s. 
are equal to each other for all t. We consider this in detail in Exercise 7. 


Combining (2.1.1) and (2.1.3), we write 


P(N, =n) = P(N, >n) — P(N, >n4+1) = P(T, <t) —P(Tnut < t) 


= ou Ae)” (2.1.4) 


n! 


252 4. RANDOM PROCESSES AND THEIR APPLICATIONS I 
Thus, N; has the Poisson distribution with parameter Ar. In particular, 
E{N,} =M. (2.1.5) 


The last formula looks very natural: the mean number of arrivals during time ¢ is propor- 
tional to t. Setting t = 1, we see that the parameter A = E{N }, the mean number of arrivals 
during a unit time. On the other hand, since 7;’s are exponential with parameter A, the mean 
interarrival time is E{t;} = 1/A. Hence, 


E{N} = 1/E{t)}. (2.1.6) 


For example, if the mean interarrival time E{t;} = + hour, then the mean number of 
arrivals during an hour is 4, which again sounds quite natural. 

> However, the reader should not be misled: such a simple formula is not true in general. 
If the t’s are not exponential, one can hope only for the asymptotic relation 


1 
zN, L, 1/E{t;} as t 3 œ. (2.1.7) 


(See also Exercise 8 and, e.g., [10], [38], [50], [122]. For the definition of convergence in 
probability, +, see Section 0.5.) < 

Consider now an interval A = (t,t +s] and the increment Na, that is, the number of 
arrivals during A. In view of the memoryless property of the exponential distribution, we do 
not have to calculate the distribution of Na: at any moment f, the process starts over as from 
the beginning, and its evolution does not depend on what happened before t. Consequently, 
for P(N, = n) we should take the same formula (2.1.4) and replace t by |A|, the length of 


the interval A. This gives 


P(N, = n) = e^ es (2.1.8) 


Thus, we have established the following properties of the process N; : 


P1. No = 0 with probability one (since the time of the first arrival, T1, is positive with 
probability one, and at the time zero, we do not observe an arrival). 


P2. N; is a counting process with independent increments. 
P3. For any A, the r.v. Na has the Poisson distribution with parameter A|A\. 


These properties may be considered a new definition of the process N, since they are 
equivalent to the original definition in terms of the interarrival times Ti. 
Indeed, from P3 it follows that for the first arrival 


P(t, >t) =P(N, =0) =e, (2.1.9) 
that is, Tı is exponentially distributed. For t2, using P2 and (2.1.8), we have 
P(t) > t|; = s) = P(no arrivals in (s,s +t] |ti = s) = e™. 


Consequently, T2 is also exponential and independent of tı. The other t’s are considered 
similarly. 


2. Poisson and Other Counting Processes 253 


The process described is called a homogeneous Poisson process, and the corresponding 
flow of arrivals (for example, a flow of claims)—a Poisson flow. 


EXAMPLE 1. Assume that the flow of claims being received by the claim department 
of an insurance company is well approximated by a Poisson flow and that the mean time 
between two consecutive claims is half an hour. 

(a) Find the expected value and the variance of the number of claims during the period be- 
tween 2pm and 6pm. We choose an hour as a unit of time. In view of the memoryless prop- 
erty, we can consider any time, including 2pm, an initial time. Since E{t;} = (1/A) = 1/2, 
we have A = 2, and E{N2,.6\} = E{Na} = 4A = 8. In the case of the Poisson distribution, 
the variance Var{No 6} = E{N2,6} =8. 


(b) Find the probability that there will be exactly 10 claims, and the probability that there 
will be at most 10 claims, during the same period. By (2.1.8), P(N(2,.6;=10)=P(N4=10) 
e-?4(2-4)!°/10! 0.099. The probability P(N, < 10) = PoissonDist(10; 8), where the 
symbol PoissonDist (x; à) stands for the Poisson distribution function (in x) with parameter 
i. We will use this type of symbols (not looking too mathematically) when direct calcu- 
lations are cumbersome, and require software to compute. Such symbols coincide with or 


are close to the corresponding commands in popular programs like Excel. 
10 
In our case, PoissonDist (10; 8) = Ł e 8(8)*/k!. It hardly makes sense to estimate the 
k=0 
last sum by hand, but using a computer, it evaluates to ~ 0.816. 


(c) Find the probability that if we start to count at 2pm, the seventh claim will come after 
5pm. In view of (2.1.1), the probability in consideration is P(T7 > 3) = P(N3 < 7). We 
could choose to compute either side of the last equality. If we choose the left, we can use 


3 397 
(2.1.2) with n =7, A = 2, and write P(T; > 3) =1 -f fr, (x)dx = 1 -f xe dx = 
0 o 6! 
1 — GammaDist (3;2,7) ~ 0.606, where GammaDist(x;a,v) stands for the [-distribution 
function (in x) with parameters a,v. There is a corresponding command in Excel. 
If we prefer to compute P(N3 <7), we write P(N3 <7) = P(N3 < 6) = PoissonDist(6; 6), 
since E{N3} = 2-3. The answer is, naturally, the same. 


EXAMPLE 2. Assume now that for the same problems, the information given concerns 
the number of claims rather than interarrival times. For example, suppose it is given that on 
the average the company receives 5 claims each 6 hours. This means that E{N6} =5 =A-6, 
so A = 5/6, and we may repeat all calculations with the new A. 


EXAMPLE 3. Consider the same process as in the above examples. Suppose it was 
noticed that, on the average, in half of the cases, half an hour or more elapses before the 
subsequent claim arrives. Translating into notation, it says that P(t; > 1/2) = 1/2. Hence, 
e7™2 = 1/2, and A = 21n 2 ~ 1.38. After that, we proceed as above. 


The model described provides a simple way to simulate Poisson variables and the Poisson 
process itself. We simulate a sequence of values of independent exponential variables with 
a parameter A, which will represent the values of consecutive interarrival times. Then, we 
sum them up until the sum exceeds one. In accordance with what we have proved, the 


254 4. RANDOM PROCESSES AND THEIR APPLICATIONS I 


0.38 0.10 0.60 0.90; 0.88 0.96 0.01 : .86 0.14. 0.25, 0.05 
0.24 0.57. 0.13 0.03) 0.03 0.01 1.06 : .04| 0.49, 0.35 0.77, 0.86 0.45 
0.24 0.81 0.94 0.97) 1.00 1.01 2.07 : 3 2.83 3.18 3.95 4.81 5.26 


fot 


FIGURE 5. Simulation of a Poisson r.v. 


number of terms in such a sum, not counting the last, has the Poisson distribution with the 
parameter À. 

We may simulate values of exponentially distributed r.v.’s in accordance with the inverse- 
distribution-function method of Section 0.3.2.1. More precisely, consider the exponential 


1 
d.f. F(x) = 1—e7*, and its inverse F~!(y) = -zm — y). Then, in accordance with 
Proposition 7 of Section 0.3.2.1, if Z is a r.v. uniformly distributed on [0,1], then the r.v. 


1 
X = — ~In(1 — Z) is exponential with parameter À. 


A 
Note that 1 — Z is also uniformly distributed on [0,1]; the proof is left to the reader. 


Therefore, we can use InZ instead of In(1 — Z), because their distributions are identical. 

Fig.5 presents an Excel worksheet. Row 1 contains random numbers—denote them by 
Z—uniform on [0,1], generated by Excel. The parameter A is specified in Cell A7. 

Numbers in Row 2 are obtained by the formula -i ln Z. They are independent values of 
a r.v. having the exponential distribution with the parameter A. (These values, are, certainly, 
pseudo independent, since we used a computer to generate them.) 

Row 3 contains the successive sums of the numbers from Row 2, so the numbers in Row 
2 correspond to the arrival times. We see that there were 5 arrivals in the first unit of time, 
1 arrival in the second, 4 in the third, and so on. These numbers simulate a sequence of 
independent values of a Poisson random variable. Excel is not very convenient for such 
simulation, so we restrict ourselves to this simple worksheet. 


2.2 The non-homogeneous Poisson process 
2.2.1 A model and examples 


The distribution in (2.1.8) depends on the length of the interval but not on its location, 
which indicates that the intensity of arrivals does not change in time. In many real situa- 
tions, this may be assumed only for a short period of time. For example, the intensity of 
the flow of claims coming into an insurance company may vary at different moments of the 
day, days of the week, and seasons of the year. 

To model this, we introduce a function A(t) which is interpreted as the instantaneous 
intensity of the arrival flow at time t. (The significance of the word “instantaneous” is the 
same as for the speed at a particular moment of a vehicle moving with a varying speed). 
We call the function A(t) an intensity function of the process of arrivals; or simply intensity. 


2. Poisson and Other Counting Processes 255 


Let : 
x(t) = I Nee (2.2.1) 


If A(t) is equal to a constant A for all t, then x(t) = Ar. 
For an arbitrary interval A = [t,t + uJ, set 


XA = [as = [Moas= [ Moas- f Mojas =xX(t+u)—x(t). (2.2.2) 


When A(t) =A, we have x4 = Au = A|A| , where |A| stands for the length of A. 
By analogy with Properties P1—P3 from Section 2.1, we call a process N;, t > 0, a non- 
homogeneous Poisson process if it has the following properties. 


PN1. No = 0 with probability one. 
PN2. N; is a counting process with independent increments. 


PN3. There exists a non-negative function A(t) such that for any interval A, the rv. Na has 
the Poisson distribution with parameter %a defined in (2.2.2). 


Thus, for any interval A, 
P(Na =n) = ota kA, (2.2.3) 
From (2.2.3) we obtain, in particular, that 
E{Nx} = Xa. 
Considering an interval A = [0,t], we have 


P(N, =n) =exp{—x(r)} Uou (2.2.4) 


The homogeneous case corresponds to a constant intensity A(t)=A: in this case x(t)=At, 
and we return to (2.1.4). 


EXAMPLE 1. Customers arrive at a service facility according to a non-homogeneous 
Poisson process with a rate of 3 customers/hour in the period between 9am and 11am. After 
llam, the rate is decreasing linearly from 3 at 11am to zero at 5pm. Find the probability 
that there will be not more than 15 customers between 10am and 4pm. 

Taking 9am as an initial time, we have A(t) = 3 for t € [0,2], and A(t) = (8 —1)/2 for 


t € [2,8]. For the interval A = [1,7], the expected number of customers %a = | A(s)ds = 


11.75, and we get (for example, using Excel) that P(Nq < 15) = PoissonDist (15; 11.75) ~% 
0.862. 


In the general case, when the rate can change in time, the number of arrivals may be 
small during a long period, and may be large during a short period. 


EXAMPLE 2. (Can the number of arrivals be finite during an infinite period of time?) 
The question concerns the value of 


E{No0,00) } = X 0,0) = f A(s)ds. 


256 4. RANDOM PROCESSES AND THEIR APPLICATIONS I 


If the integral above converges, then the mean number of arrivals is finite. Suppose, for 
instance, that A(s) = e~*. Then the mean E{No,..) } = f e °*ds = 1, so during the infinite 
0 


period there will be on the average only one arrival. One may consider any example for 
which the integral above converges; say, A(s) = 1/(1 +57). Then the r.v. Noo.) is again a 
Poisson r.v. with a finite mean value. (What is it equal to?) 


EXAMPLE 3 (An explosion). Let under some circumstances the rate of claims start to 

grow very fast; from a modeling point of view, we can consider A(t) converging to infinity 
t 

during a finite period. For example, let A(t) = 1/(1 — t). Then x(t) = f (1—s)"'ds = 
0 


—In(1—r) fort < 1. Hence, x(t) — œ% as t + 1; that is, during the period [0,1] there will 
be an infinite number of claims with probability one. 


Consider now how interarrival times look in the non-homogeneous case. We will see 
that, in general, they are no longer exponential or independent. 
Indeed, for the first arrival 


P(t, >t) = P(N, =0) =exp{—yx(1)}. (2.2.5) 


If A(t)=A, we come to the exponential distribution (2.1.9), however if, for instance, A(t)=r, 
t 

then P(t; > t) =exp{— f sds} = exp{—t?/2}. In this case, 7) has a distribution different 
0 


from the exponential distribution. 
Furthermore, for the nth interarrival time Tt, given the time 7,1 of the previous arrival, 
the conditional probability 


t+s 
P(t > s| Th-1 = t) = P(Ng is] = 0| Th-1 = t) = P(No is] = 0) = exp i-f Mu)du . 
t 
(2.2.6) 
We see that the above probability depends on the time of the previous arrival. 
However, it should not look scary because the distribution of the nth arrival is still 
tractable: the formula (2.1.1) is true in the general case, and 


P(T, <t)=P(N, >n) =1-—P(N, <n), (2.2.7) 
which may be computed as a Poisson probability. 


EXAMPLE 4. Suppose that the intensity A(t) = 9(8 — t)? /64 for 0 < t < 8, that is, 
starting from 9, the intensity decreases as a parabola and equals zero at time 8. The expected 


9 r8 
number of customers during the whole period is x(8) = 6A i (8 —t)dt = 24. 
0 


Is the probability that at least 20 customers will come within the first half of the period 
significant? (Notice that the intensity is decreasing rapidly.) 


9 4 
We have E{Ny} = x%(4) = af (8 —t)*dt = 21, so the probability should be more 


than 0.5. More precisely, by (2.2.7), P(Too < 4) = P(N4 > 20) = 1 — P(N, < 19) = 1— 
PoissonDist(19;21) + 0.615. 


2. Poisson and Other Counting Processes 257 


2.2.2 Another perspective: Infinitesimal approach 


Conditions P3 and PN3 in the above definitions of homogenous and non-homogeneous 
processes, respectively, are clear; but its verification in particular cases may cause difficul- 
ties. The equivalent definition below proceeds from the behavior of the process merely in 
small time intervals. It is quite useful for applications and gives an additional insight into 
the nature of Poisson processes. 

Below, the symbol o(ð) stands for a function which is “much smaller” than 6 for small 
ô; formally, 50(8) — 0 as 6 — 0. It is important to emphasize that in different formulas 
o(5) may denote different functions but since we do not need to specify them, we will use 
the same symbol in different formulas. The reader may find more comments regarding this 
notation in the Appendix, Section 4.1. 

We define N, from scratch. Suppose that it is a counting process N; with independent 
increments such that No = 0, and for a function A(t) > 0 and any positive t and 6, 


P(Noi+3) = 1) = A(t) +0(8) as 8-0, (2.2.8) 
P(Nos+8] > 1) = o(ò) as 6 > 0. (2.2.9) 


Again, 0(6) in (2.2.8) may differ from 0(8) in (2.2.9). In (2.2.8), the term 0(8) is negligible 
with respect to A(t), while relation (2.2.9) means that for small 6 (i.e., for small intervals), 
P(Ng 1+5] > 1) is negligibly small. 

This may be understood as follows. For a small time interval, the probability of an arrival 
is proportional to the length of the interval (up to a negligible remainder), and the proba- 
bility that more than one arrival will occur is negligible. The coefficient of proportionality, 
X(t), depends on time and, as in Section 2.2.1, is interpreted as the mean number of arrivals 
per unit of time in a neighborhood of t. It is called the rate or the intensity at time t. 

We define x(t) and %{ as in (2.2.1) and (2.2.2), respectively. 


Proposition 1 For the process N, defined above and any interval A, 


n 
PN) eta (2.2.10) 

A proof will be given in Section 2.2.3. 

The infinitesimal approach allows us to easily solve many problems involving varying 
intensities. 

Assume, for example, that arrivals—for instance, claims—are counted (or accepted) only 
with a probability p, perhaps depending on time: p = p(t). In this case, the only change 
needed is to replace the intensity A(t) by p(t)A(t) in condition (2.2.8). 

Indeed, for an arrival to be counted in a period [t,t + 8], first, the arrival must occur—the 
probability of this is A(t)5+ 0(8). Secondly, once an arrival has occurred, the probability 
that it belongs to the first type is p(t). If we count only this type of arrival, we should 
multiply A(t) + 0(8) by p(t). This gives (A(t) + 0(8)) p(t) = p(t)A(t) + p(t)o(t). The 
last term is negligible with respect to the second, and we may rewrite the whole expression 
as p(t)A(t) +o(t). 

This is a particular case of the marked Poisson process. We have already considered this 
phenomenon in the static case in Section 3.2.2.2 and, in greater detail, in Section 3.3.1.2, 


258 4. RANDOM PROCESSES AND THEIR APPLICATIONS I 


and Exercises 3-12, 3-13. 


EXAMPLE 1. After some moment which we view as initial, the flow of customers 
entering a service facility started to grow with intensity A(t) = 3(1 +1). In order to keep 
the rate equal to the initial rate 3, the management decides to refuse some customers service 
depending on the character of the job to be done. Since the type of the next customer is 
not known in advance, the process of refusals is random. With what probability should the 
facility accept the claim arrived at time t? With probability p(t) = 1/(1 +1), since then 


PHAM) = 3. 


EXAMPLE 2 ([153, N13]!). During the hurricane season (August, September, October, 
and November), hurricanes hit the US coast with a monthly rate of 1.25, and each hurricane 
during this period has a 20% chance of being a “major”. Outside of hurricane season (the 
other months), hurricanes hit at a Poisson rate of 0.25 per month, and each such a hurri- 
cane has only a 10% chance of being “major”. Determine the probability that a hurricane 
selected at random is “major”. 

The problem concerns the formula 


pi=Ai/r 


from Sections 3.3.1.2 and 3.3.2.1 for the probability that a particular claim comes from the 
ith group. In our case, claims are hurricanes, and the group is that of “major” ones. Using 
(2.2.1), for the annual intensity of hurricanes we get A = (1.25) -4+ (0.25) -8 =7. 

To find the intensity of hurricanes from the group mentioned, we should use the same 
formula (2.2.1) multiplying the intensity A(s) by the corresponding probability of counting 
p(s). This leads to A; = (1.25) -4-0.2 + (0.25) -8-0.1 = 1.2. 

Then the probability pı = (1.2/7) + 0.1714. 


In the last two examples, we considered two types of arrivals: counted and not counted. 
Assume now that claims arriving may be of / types, and the probability of the ith type, 
independently of what happened before, is p;. We have already touched on this question 
in Section 3.3.1.2. In the present framework, it immediately follows that the process of 
claims of the ith type, Nyi, is Poisson with the intensity A;(t) = p;A(t), and, as was proved 
in Section 3.3.1.2, the processes N;; are independent. If the ith type implies a payment of 


xi, the total payment by time ¢ is 
l 
ya: (2.2.11) 


See an example in Exercise 18. 


2.2.3 Proof of Proposition 1 


Set p(t) = P(N, = n), and for a 6 > 0, consider two moments of time: t and t + 6. Note 
that if N;+5 = n, and No t48] equals some k < n, then N, should be equal to n — k. Hence, 


TReprinted with permission of the Casualty Actuarial Society. 


2. Poisson and Other Counting Processes 259 


for n = 0,1,2,..., in view of the independence of increments, 


Pa(t +8) = P(N, 5 =n) = y P(N, =n—k, Not+8) = k) 
k=0 
= 2 P(N, =n —k)P(NG 148] =k) =P(N, = n)P(No +8] =0) 
k=0 
+ P(N, =n —1)P(Ngi43) = 1) + $ P(N, =n — k)P(Ng ts) = K). 
k=2 


We are going to use (2.2.8)-(2.2.9). As was noted, the symbol 0(5) in these formulas may 
denote different functions. Nevertheless, we can combine them, and if for example, we 
consider the sum of remainders, then we arrive at another remainder which may be again 
denoted by 0(8). First, 


P(No¢+3] = 9) = 1-P (Ng 8 =1)-P(Ng +9) > 1) = 1-(A(t)5-+0(8) )—0 (8) 
= 1—A(t)d+0(8), 
and, secondly, 


n n 


L P(N, =n—k)P(Na +5 =k) < Y PN e433 =k) < P(No1+8 > 1) = 0(8). 
k=2 k=2 


Hence, pn(t +8) = pa(t)(1 — A(t) + 0(8)) + Pn—1(t)(A(t)8 + 0(8)) + 0(8) = pa(t) — 
A(S) + pn_1(t)A(t)5 + 0(8). We rewrite it as 


5 lPalt +8)—palt)] = MO) prl) +AA) Pn-ilt) + 5008). 


Letting 6 > 0, and recalling that 50(8) — 0, we come to the differential equation 
Palt) = ME) pn (t) + A(t) pn- (t). (2.2.12) 


For n = 0, since p_1(t) = P(N, = —1) = 0, 


polt) = —A(t) po(t). (2.2.13) 


A solution to the last equation is 


polt) =exp{—x(t)}, (2.2.14) 


where x(t) is defined in (2.2.1). The reader who does not remember how to solve equations 
of this type can easily verify (2.2.14) by substitution. The reader who remembers this 
should take into account that po(0) = 1 (we start with no claims), and together with this 
initial condition, solution (2.2.14) with x(t) given in (2.2.1) is the unique solution to the 
ordinary differential equation (2.2.13). 

Once we know po(t), we get from (2.2.12) that 


P\(t) = —A(t)pi (t) + A(t) exp{—X()}, (2.2.15) 


260 4. RANDOM PROCESSES AND THEIR APPLICATIONS I 


which leads to 
Pi(t) =x/(t)exp{—x(t)}- 


The reader can again check it by substitution. Continuing the same procedure by induction, 
we come to solutions for any n; namely, to (2.2.4). 

Consider an arbitrary interval A = [r,t +u]. In this case, proceeding as in the previous 
section from the independence of increments, we can view the point t as an initial point and 
use the same formula (2.2.4) with x(t) replaced by %4. W 


Route 1 => page 261 


2.3 The Cox process 


Certainly, the process of arrivals, in particular, the flow of claims arriving at an insurance 
company, may be more complicated than the Poisson process. An important modification 
is the Cox process which is defined as a Poisson process whose intensity A(t) is random. 
More precisely, instead of A(t) we consider a random process A,. A similar situation for a 
static model was explored in Section 3.2.2.1. 


EXAMPLE 1. In the time interval [0,2], two periods are distinguished. On [0,1], the 
random intensity A, equals a r.v. Z;, and on (1,2], the intensity A, equals a r.v. Z2. The 
value of the intensity at the switching point t = 1 does not matter: a point is an interval of 
zero length, so the probability of an arrival exactly at this point is zero. 

Thus, Njo,ı} and Nọ; 2) are r.v.’s with mixed Poisson distributions described in 3.2.2.1. 

Given Z1, the r.v. Njo,ı] is Poisson with a mean of Z1. A similar assertion is true for Ng 2). 
So, we can write that E{Nio1)} = E{Zi},E{N 2} = E{Z2}, and for the total number of 
claims 


E{Noz} =E{No yt +E {Nag} = E{Zi} + E{Z}. 


If Z|, Z2 are independent, the r.v.’s Njo,ı} and Nọ; 2) are also independent, and the distribu- 
tion of Njo,2j is the convolution of the distributions of the r.v.’s Nig) and NG 2). 


In general, the Cox process does not necessarily have independent increments. Values 
of the process during a particular period may contain some information about the intensity 
process A;, which may be used for predicting the behavior of the process in the future. It is 
apparent in the case when the process A; = A, the same r.v. not depending on time. 


EXAMPLE 2. The process of claims N, runs over the period [0,2], the process A; = A, 
and the r.v. A takes on values 3 and 10 with probabilities 3 /4 and 1/4, respectively. Suppose 
that we have observed the process during the period [0,1], and that N; took a value of 8. 
How many claims should we expect during the interval [1,2], having this information? 
What is the probability that Nj; 9) will exceed, say, 5? 

Note that, for the sake of simplicity, we chose unit intervals, and the particular numbers 
are the same as in Example 3.2.2.1-5. We have computed there that P(A = 10|N; = 8) ~ 


3. Compound Processes 261 


0.822. Hence, E{Nj 2 |N1 = 8} © 3 -0.178 + 10- 0.822 = 8.754, whereas the prior uncon- 
ditional expectation E{N 2} = 3 -0.75 + 10-0.25 = 4.75. Next, P(Np 3 > 5|Mi = 8) © 
1 — (0.822 - PoissonDist (5; 10) +0.178 - PoissonDist (5; 3)) ~ 0.782. 


EXAMPLE 3. (The Polya process). Let A have the I -distribution with parameters a and 
v as in Examples 3.2.2.1-2 and 6. 

The random parameter A is an intensity. So, given A, the r.v. N, has the Poisson distribu- 
tion with parameter tA. Hence, when determining the distribution of N;, we should replace 
the r.v. A in Example 3.2.2.1-2 by tA. Then the corresponding I’-distribution in the men- 
tioned example will be that with parameters (a/t), v. Consequently, in accordance with 
the result from Section 3.2.2.1, N; has the negative binomial distribution with parameters 
__(a/t) _ a 
= 1+(a/t) t+a 


Pi and v. In particular, 


E{N,} =tE{A} = Ua 


Let, for example, a = 2 and v = 4. In this case, Aave = E{A} = (v/a) = 2. Assume that 
we monitored the process until time t; = 1 and observed that N; = 5. In accordance with 
Example 3.2.2.1-6, the conditional distribution of A, given N; = 5, is the I-distribution 
with parameters @ = 2+1=3,V=5+4=9. Hence, the number of arrivals in the next 
unit period, Nj 2), has the negative binomial distribution with parameters p = @/(1 +4) = 
3/4, and V = 9 (see Section 3.2.2.1). In particular, we should expect on the average not 
ave = 2 claims, but E{A|N = 5} = 3 claims. 


3 COMPOUND PROCESSES 


A compound process is a process 


N, 
Sq) = 2o (3.1) 


where N, is a counting process, and the X;’s are i.i.d. r.v.’s not depending on the evolution 
of the process N,. 

The interpretation for the insurance model is clear: N; is the number of claims arrived by 
time t, and X; is the size of the ith claim (in other terms, a severity). We write the index 
(t) in order not to confuse, say, S2) with S2 which usually denotes X; +X as opposed to 


Ba Xi = SQ). In the latter case, the sum contains a random number of addends, and M2 
may be larger or less than two. 

We use also the habitual notation X; from Chapter 3. It should not cause confusion 
with the notation X; for a random process as we will not use these notations with different 
meanings in the same context. 

The graph of a typical realization is given in Fig.6. At the initial time, there are no claims 
and therefore S/o) = 0. The first claim of a size of X; appears at time 7), and the process 


262 4. RANDOM PROCESSES AND THEIR APPLICATIONS I 


Sir) 


FIGURE 6. A typical realization of the compound process 


jumps up by X;. Then, during a period whose length equals the second interarrival time T2 
nothing happens and after that, at time Tı + T2, the second claim of a size of X; arrives. So, 
the process jumps up by X2, and the further evolution runs in the same fashion. 

We call Sq) a compound process, and if N, is a Poisson process—a compound Poisson 
process. 

The scheme (3.1) is very similar to the scheme considered in detail in Chapter 3. The 
only difference is that now the number of addends, N;, depends on the time parameter f. 
If we are interested in the distribution of S(,) for a particular fixed r, then all results of 
Chapter 3 apply to the current scheme. Dynamic problems connected with the evolution of 
the process S(,) in time will be considered later in Chapter 6. 

Let us return to the distribution of Sq for a particular t and consider a compound Poisson 
process. In this case, when using the results for the compound Poisson distribution in 
Chapter 3, we should replace the parameter A of the Poisson r.v. N in Chapter 3 by E{N,}. 

First, assume the process N; is homogeneous. Then E{N,} = At, and in this case (see 
Section 3.1), 

E{ S} = màt, Var{Sq} = (o? +m? )àt, 


where m = E{X;}, 0° = Var{X;}. 

Tf t is large, we can apply the normal approximation for the distribution of S in accor- 
dance with Theorem 3.4.1-12. In particular, it concerns the approximation for the premium 
c; such that P(S¢) < cr) > P, where B is a given level of security. (See comments in the 
beginning of Section 3.4.2.) Setting c; = (1 + )mAt, we obtain the counterpart of formula 
(3.4.2.2): 


ABs (0? + m?)At ABs 
0x = V1+k 3.2 
where the coefficient of variation k = o/m and B is a given security level. 
It is worth emphasizing, however, that in the above result, the time moment t is fixed, and 
© in (3.2) concerns this particular chosen t. The problem of finding O for which Sip < cr 
for all t from an interval [0,7] is more complicated and will be considered in Chapter 6. 


In the case when N, is non-homogeneous, we replace the parameter A of the Poisson r.v. 
N in Chapter 3 by E{N,} = x(t) defined in Section 2.2.1. In this case 


E{Sq}=myx(t), Var{Sqy} = (0° +m*)x(t). 


4. Markov Chains 263 


If ¢ is large, and x(t) — œ% as t — œ (we saw in Example 2.2.1-3 that this is not always 
the case), then we can again apply the normal approximation for the distribution of Sq in 
accordance with the same Theorem 3.4.1-12. In particular, we set c; = (1 + 0)mx(t) and, 


instead of (3.2), write 
Ox a Vite. (3.3) 
X(t 


Since this topic is very similar to what we did in Chapter 3, we omit direct particular 
examples; see also Exercises 23-24. 


Route 1 => page 339 


However, let us consider a nice 

EXAMPLE 1 ([157, N10]?). An insurance policy has aggregated losses according to 
compound Poisson distribution. Claim frequency follows a Poisson process. The average 
number of claims reported is 200. Claim severities are independent and follow an expo- 
nential distribution with a mean of 160,000. 

Management considers any claim that exceeds 1 million to be a catastrophe. Calculate 
the median waiting time (in years) until the first catastrophe claim. 


SS” 


The word “severities” means the claim sizes, i.e., the X’s. Let K be the number of the 
first catastrophe claim. For a claim, the probability that it will be a catastrophe one is 
p = P(X > 10°) = exp{—10°/(16-10*)} ~ 0.0019305. Since we may view an appearance 
of a catastrophe claim as a “success”, K has the first version of the geometric distribution 
with the above parameter p. 

Since N, is a Poisson process, the inter-arrival times qT; are exponentially distributed with 
a = 200. 

The arrival time of the first catastrophe claim is the r.v. T = £; t;. In accordance of 
the result of Example 3.3.1.3-1, the rv. T has the exponential distribution with parameter 
a= pa = 0.386. 

We are looking for a t such that P(T >t) =e~“ =0.5. Hence, t = m2 x 1.795. 


4 MARKOV CHAINS. 
CASH FLOWS IN THE MARKOV ENVIRONMENT 


4.1 Preliminaries 


A Markov chain is a Markov process X; in discrete time t = 0,1,2, ... . Often this term is 
applied to processes taking integer values, and we will consider this case below. Moreover, 
as a rule though not always, X; will take on non-negative integer values i = 0, 1,2,.... 

If X, assumes a value i, the process is said to be in state i at time t. Usually, such a 
model serves for a description of the evolution of a dynamic system moving from one state 


ZReprinted with permission of the Casualty Actuarial Society. 


264 4. RANDOM PROCESSES AND THEIR APPLICATIONS I 


to another at discrete moments of time. Since the process is Markov, to determine the 
probabilities of its possible realizations, it suffices to specify the following characteristics. 


A. Pij = P(X:41 = j|X; = i), the probability that the process, being in state i at time t, 
will make a transition to state j in the next step. In particular, +p; is the probability 
of staying in the same state i at the next time moment t + 1. 


B. To; = P(Xo = i), the probability of being in state i at the initial moment of time. 
Probabilities ,p;; are called transition probabilities, and the matrix 


tP00 Poi Poz 
tP10 Pil P12 
tP20 P21 P2 


P = [ipil = 


is called a transition probability matrix, or briefly, a transition matrix. 

We call the vector of probabilities mo = (noo, no1, Mo2,---) an initial probability distri- 
bution. If the system starts from a fixed state ig, then To = (...,0,1,0,...), where 1 is in 
position ig. 

Since the process always stays in some state, the sum of all entries in the vector To is 
equal to one. The same is true for the sum of all probabilities in each row of the matrix ‚P 
since the process moves from the state where it is in, to some state (perhaps the same). 

A matrix with non-negative elements and with such a property is called stochastic. Any 
stochastic matrix may be the transition probability matrix for some chain. Indeed, we would 
just define such a chain setting its transition matrices ;P equal to the matrix given. 

A chain is called homogeneous if its transition probabilities ;p;; do not depend on ż, that 
is, :pjj equals some pj; for all t,i, and j. In this case, we write “P=P= I|Pijll- 

There are myriads of the examples of Markov chains. For now, we consider four. 


EXAMPLE 1. For a car insurance portfolio of a company operating in a particular area, 
the intensity of daily claims depends on weather conditions. The company distinguishes 
three types: normal, rainy, and icy-road conditions, leading to three states which we la- 
bel 0,1,2. Particular transition probabilities characterize the conditions of a season in the 
area, and if the time period under consideration is not long, we may assume that these 
probabilities do not depend on time. For instance, the transition matrix 


0.6 03 0.1 
P=|04 05 0.1 (4.1.1) 
0.25 0.7 0.05 


may characterize a soft winter. Since among pj;’s (the probabilities to stay in the same 
state as on the previous day) the probability poo = 0.6 is the highest, the normal condition 
appears to be the most stable. The distribution %o characterizes the condition at the initial 
time. For example, if in the beginning of the season it was raining, then % = (0,1,0). 

It is important to emphasize that the above Markov model implicitly presupposes that 
given the current condition, the information on the weather conditions on previous days 


4. Markov Chains 265 


is not needed for the weather forecast. In some situations, such a simplification may be 
acceptable, but in general it is certainly useful to know the tendency in weather conditions 
in the past. So, such a simple Markov model may not be as useful in providing accurate 
results. 

However, it does not mean that we should refuse the Markov setup. We can keep using it 
if we extend the set of possible states of the weather on a current day. Namely, we should 
include into the characterization of states information about the past. For example, if we 
want to take into account the information about the weather on two consecutive days, we 


99 66. 


may consider, for a current day, such states as “rainy today, and rainy yesterday”, “rainy 


today, and normal yesterday”, “icy today, and rainy yesterday”, and so on. Such a model 
may turn out to be adequate enough. See also Exercise 30. 


EXAMPLE 2. For a group of people, let the probability that a person of age s will attain 
age s+1 be ps. Set qs = 1 — ps, and consider a person of an initial age x from this group 
and two states: alive (more generally, intact) and deceased (more generally, failed). Since 
the person chosen is alive, To = (1,0). If the person lives t years more, she/he will attain 
the age x +f, and the probability that, after that, she/he will live at least one year more is 
Px+1. Hence, the transition matrix 


The chain is not homogeneous, and, say, for x > 30, it is reasonable to assume that p,+; 
is decreasing in t. (In the range of small s, the probability pẹ may actually increase. For 
example, pı may be smaller than p25; one can theorize why this might be the case. For a 
more detailed discussion on survival probabilities, see Chapter 7.) 

Denote by h, the probability that if a person of age s dies within a year, it happens because 
of an accident. Consider three states: alive, died because of an accident, died from other 
causes. In this case, 


= Pxtt Qx+t 
P-| 0 1 


Px+t Getter qx (1 z hy+1) 
P= 0 1 0 (4.1.2) 
0 0 1 


EXAMPLE 3 (Rearrangement or shuffling). A stack of k books lie on a desk. You take 
a book at random, read what you need, and put the book on the top of the stack. The states 
of the system may be identified with the orders in which the books can be arranged, that is, 
with k! permutations of the numbers 1,2,...,k. The reader may replace the word ‘book’ by 
‘card’, and “stack’—by ‘deck’. In this case, we are talking about one of simplest ways of 
shuffling. 

The process above cannot move from any state to any state in one step. For example, if 
k =3, and we number books as 1,2,3, then the process can move from state (1,2,3) only to 
the same state and to states (2,1,3) and (3,1,2). If all books are equally likely to be chosen, 
the process can move from state (1,2,3) to each of the mentioned states with probability 1/3, 
and with zero probability—to the other three states: (1,3,2); (2,3,1); (3,2,1). Applying the 


266 4. RANDOM PROCESSES AND THEIR APPLICATIONS I 


same argument to other states, we conclude that the transition matrix of our homogeneous 
Markov chain is 


1234 132} 213} 231} 312} 3214 


1/3 0 1/3 0 1/3 0 + 123 
0 1/3 1/3 0 1/3 0 + 132 
1/3 0 1/3 0 0 1/3 || < 213 
Be 1/3 0 0 1/3 0 1/3 || <- 231 Gas) 
0 1/3 0 1/3 1/3 0 + 312 
0 1/3 0 1/3 0 1/3 || +} 321 


(verify on your own; the arrows + and | show to which state (permutation) a row and a 
column corresponds). 

If in the beginning, all arrangements are equally likely, then To = (1/6, 1/6, 1/6, 1/6, 1/6, 
1/6). 


EXAMPLE 4 (The simple random walk). Let the initial state Xp be equal to an inte- 
ger u, and let X1 = X;+6&,41, where €,’s are independent r.v.’s taking values +1 with 
probabilities p and 1 — p, respectively. Then X, =u+6&)+...+&. 

For example, u may be the initial capital of a person who, in consecutive moments of 
time, either gets or loses one unit of money with the above probabilities. In this case, X; 
is the total capital at time t. We assume that the capital may be negative (the person owes 
money). The reader may think about the simplest game of chance with tossing a coin, 
assuming that the coin may be non-symmetric. 

The process above may be considered also the simplest model of the surplus process for 
a risk portfolio (see Section 1.1) in discrete time. In this case, &, is interpreted as the profit 
of the company during the period (t — 1,t]; that is, the premium minus the payment. In the 
simplest model, we can think that it assumes only values +1. 

The term “random walk” comes from another interpretation. It concerns the motion of a 
particle which moves at each discrete moment of time either to the right or to the left by one 
unit of distance. For a particle immersed in a liquid, such a motion is connected with the 
bombardment by the molecules of the surrounding medium. (In fact, the particle is moving 
in a three-dimensional space, so we are talking about the projection of such a motion on a 
certain direction.) 

States in the random walk may be identified with integers 0,+1,+2,..., and since the 
process may move from state i only to states i+ 1 or i— 1, the transition probabilities 


Piiti =P, Pix-1=1—p, pijy=O forall fj Ai+1, 


and do not depend on t. So, in the main diagonal of the transition matrix, we have zeros; 
the diagonal above the main consists of p’s; and in the diagonal below the main, we have 
(1 — p)’s. All other elements are zeros. The chain is homogeneous, %=(0,...,0,1,0,...), 
where 1 is in the uth place. 

As we will see in this and the next chapters, the simple random walk is not as simple 
an object as it may seem. Many results concerning this process are typical for processes 
of a more complicated nature. For this reason, we will repeatedly return to this model and 
eventually explore it in detail. 


4. Markov Chains 267 


Next, we consider transitions in many steps. Let us specify particular paths using the 
symbol —. For example, writing 1 — 2 — 1 — 3, we will mean that the chain, starting 
from state 1, moved to state 2, then returned to state 1, then moved to state 3. Fora 
homogeneous chain with a transition matrix P = ||p;;||, the probability of such a path is 


P12P21P 13. Other simple examples are given in Exercise 33. 
(2) 
ij 
; Aun ; ; 2 
from state i to state j in two steps, not specifying an intermediate state. Set P Gy = | py | 
To compute a we should consider all possible two-step paths i + k — j, where the 
intermediate state k is arbitrary. The probability of each such a path is pizpkj, and hence 


p = Ùx PikPkj. The r.-h.s. of the last formula is the (i, j)-element of the matrix product 


PP = P’. This, 


Consider for now a homogeneous chain and denote by p;; the probability of moving 


PË? _ P. 


Skipping similar simple calculations in the general case (the reader may apply induction) 
and restricting ourselves to the homogeneous chain case, we state the following 


Proposition 2 Let the chain under consideration be homogeneous and 
a = P(X;4n = j|X; = i), the probability of moving to state j in n steps start- 
ing from state i. (Since the chain is homogeneous, this conditional probability does not 


depend ont.) Let P Vie | (n) , the corresponding transition probability matrix. Then 


ij 


P” P”, (4.1.4) 


nm. 9 
where P is the nth power of the matrix P. 


> In the non-homogeneous case, we set oe =P Ko, = j |X; = i), and PF Wi ee : 
the corresponding transition matrix. Then 
Pp” = HP pa ae aa P: paige: (4.1.5) 


where the symbol - denotes the multiplication of matrices. < 
Below, unless stated otherwise, we restrict ourselves to homogeneous chains. 


EXAMPLE 5. Consider rearrangements (shuffling) of three books from Example 3. As 
is easy to verify, for P from (4.1.3), 


2/9 1/9 2/9 1/9 2/9 1/9 || + 123 
1/9 2/9 2/9 1/9 2/9 1/9 || + 132 
pP- 2/9 1/9 2/9 1/9 1/9 2/9 || + 213 
2/9 1/9 1/9 2/9 1/9 2/9 || + 231 
1/9 2/9 1/9 2/9 2/9 1/9 || + 312 
1/9 2/9 1/9 2/9 1/9 2/9 || + 321. 


(4.1.6) 


Thus, all two-step transitions have positive probabilities. 


In Exercise 32, we consider—in the same context — Examples 1-2. 


268 4. RANDOM PROCESSES AND THEIR APPLICATIONS I 


From (4.1.4), we get that 
perm _ pop (4.1.7) 


Equation (4.1.7) is called the Chapman-Kolmogorov equation. 


Now, let 1,;=P(X,;=i), and T; = (1,0, T71, T72, ..-), the distribution of the process at time t. 
By the formula for total probability, 


j= P(X =) = L P(X = i| Xo = k)P (Xo = k) = Y PkiTok. 
7 k 


In the vector form, it may be written as 
T1 = NP, (4.1.8) 


where a row vector is multiplied on the right by a square matrix. 
Taking into account (4.1.4), we can write a similar formula for an arbitrary time f, stating 
it as the following proposition. 


Proposition 3 In the homogeneous case, 
T =P. (4.1.9) 


EXAMPLE 6. Assume that in the situation of Example 3, k = 3 and in the beginning, 
the first book is on the top, while the second book may be equally likely either in the 
second or third position. That is, To = (1/2,1/2,0,0,0,0). Using (4.1.6) and (4.1.9), 
it is straightforward to compute that T2 = To P? = (1/6,1/6,2/9,1/9,2/9,1/9). Itis 
interesting to compare To and Tz and guess the further tendency. It will be determined in 
Section 4.4. 

Next, we consider simulation of a Markov chain. 

EXAMPLE 7. Return to Example 1. A simulation procedure is illustrated in the Excel 
worksheet in Fig.7. The initial state is a free input entry in Cell D1. We chose the value 2. 
The matrix ®P is in the array F25:H27. Random numbers generated are placed in Column 
A, time moments t—in Column C, the values of X,—in Column D. 

Since the initial state is 2, in the first step, we should consider the probabilities in the 
third row of the matrix P, that is, 0.25, 0.7, 0.05. To simulate the motion of the chain in 
the first step, we generate a random number Z. If Z is less than 0.25, (which occurs with 
probability 0.25), then the process moves to state 0. If 0.25 < Z < 0.95 (which occurs with 
probability 0.7), it moves to state 1. Otherwise, the process stays in state 2. In the particular 
realization in Fig.7, the r.v. Z ~ 0.893 is in Cell A2. So X; = 1, and it is reflected in cell 
D2. The command for Cell D2 is 

=IF(D1=0, IF(A2<$F$25, 0,IF(A2<$F$25+$G$25,1,2)), IF(D1=1,IF(A2<=$F$26,0, 
IF($F$26<A2<=$F$26+$G$26,1,2)), IF(A2<=$F$27,0, IF((A2<=$F$27+$G$27,1,2)))). 

After the first step, we proceed similarly, depending on the current state. For example, 
since X; happened to be one, to simulate X2, we should consider the second row in the 
matrix P and generate another number from [0,1]. It is in Cell A3, we proceed with the 
command in D3 similar to that in D2, and continue in the same fashion. The realization we 


4. Markov Chains 269 


the initial state | 


0.8927274 


0.6867275| | moments F 


0.4224372 other states 
0.8536943| ~ =—] 


0.1696524 
0.3178808 ization of the chain 
0.3087863 

0.356914 
0.6671957 
0.3213599 
0.2576373 
0.2908719 

0.549089 
0.5478683 
0.1898251 
0.9736625 
0.1196326 
0.8098697 
0.9927976 
0.2686544 
0.7107456 
0.8444166 
0.7566149 
0.5019684 
010557573 
0.8172246 
0.3868221 
0.2483901 
0.0555742 
0.6490982 
0.7506027 


FIGURE 7. Simulation of the Markov chain from Examples 1,7 in Section 4.1 


got in Column D is presented in the chart. Picking other numbers for Column A, we will 
get another realization. 

Readers who are familiar with Excel, realize that the commands in Column D were 
arranged by one-moment copying, and all numbers in column A have been also generated 
by one command. 


4.2 Variables defined on a Markov chain. Cash flows 


For the most part, this section concerns modeling cash flows. Additional examples of 
applications may be found, e.g., in [30]. We assume all chains under consideration to be 
finite (that is, having a finite number of states) and, unless stated otherwise, homogeneous. 


4.2.1 Variables defined on states 


Consider a Markov chain X,. For each time moment t, we assign to each state i a random 
variable Y;;. We assume that, given i, the r.v. Y,; does not depend on other r.v.’s involved in 
the model. 


EXAMPLE 1. As in the situation of Example 4.1-1, X; is the indicator of the weather 
condition at time ¢, and Y; is the number of claims in period t, given that the weather 
conditions correspond to state i—that is, given X, = i. 


270 4. RANDOM PROCESSES AND THEIR APPLICATIONS I 


EXAMPLE 2. X; indicates the health condition of a person in annual period f¢, and Y;,; is 
the (random) annual health care cost for year t if X, = i. 


Which particular r.v. Y,; “appears” at time t depends on the state at which the process will 
arrive at time ¢, that is, on the value of X,. For example, if X, assumes a value of 3, we 
consider Y,3, while if X, = 6, we consider Y,6. So, as a matter of fact, the index i in Y; is 
random, and when modeling the evolution of the system as a whole, we should replace the 
index i by X;. 

Thus, we define the r.v. 

Z = Vix, - 


In Example 1, it would represent the number of claims, in Example 2—the cost in the 
period t. Such r.v.’s are said to be defined on a Markov chain. 
Let F(x) = P(Y,; < x). Then 


P(Z <x) =), P(Z < x|X = DPX; = i) = VPM <x) ti = YE Rix). 


Thus, the distribution of Z; is the mixture of the distributions Fy; with respect to the 
distribution T. 


EXAMPLE 3. Consider the situation of Example 1 above with the transition matrix from 
Example 4.1-1. Assume that Y;; is a Poisson r.v. with parameter A;. Let Ao = 2, Ay = 4, 
Az = 8. Given that at the initial time the weather conditions are normal, find the probability 
that during the second day there will be at most 3 claims. 

Thus, To = (1,0,0), and by (4.1.9), m2 = ML, where ®P is given in (4.1.1). So, we 
easily compute, using any software we like, that m2 = (0.505, 0.4, 0.095). Then, using the 
same symbol PoissonDist as in Example 2.1-1b, we have P(Z2 < 3) = PoissonDist (3, 2) - 
0.505 + PoissonDist (3,4) - 0.4 + PoissonDist (3, 8) - 0.095 ~ 0.857 - 0.505 + 0.433 - 0.4 + 
0.042 - 0.095 ~ 0.610. 


Let 
Sn = ys Z= La Yx, 


In the above examples, S„ represents either the total number of claims or the total payment 
during n + 1 periods of time (we count the initial period). 

Computing the distribution of S„ is complicated and we restrict ourselves to E{S,}. The 
matrix notation allows us to do that in a nice compact form. 

Let cr = E{Y,;}, and vector c; = (cr1,c/2,...). In the situation of Example 2, the vector c; 
characterizes the possible cash flow at time t. We have 


E{Z,} = YEY pP (X =i)= Y; Cri = (Cr, T), 


where (-,-) stands for scalar (or dot) product. Then, in view of (4.1.9), E{Z,} = (c, ToP ts 
and 


E{Sn} = $ (er, Mok’). (4.2.1) 


(By convention, P is the identity matrix of the corresponding size.) 


4. Markov Chains 271 


First, consider the case when ¢, = ¢ = (c1,C2,...), that is, does not depend on t. Then 


E{Sn} = $o (6 Wok) = (c, t0} P’) = (c, m0 K,), (4.2.2) 


where the matrix l 
n 
K, = D P ` 
This is a geometric series of matrices, so it is tempting to use the formula for the sum of 
such a series. 
For any number r Æ 1, we have 1 +r+... +r” = (1—r)1(1—7"*') => (1 —r)™! as 
n —> %, if |r| < 1. For a matrix B and the identity matrix Z, we do have 


T+B+...4B" =(I-B)\(r- Br") + aB (4.2.3) 


if the inverse matrix (Z — B)-! exists. The limiting relation is true provided that the 
absolute values of all eigenvalues of B are less than one (see, e.g., [61], [79]). 

However, we cannot use (4.2.3) in the situation above because det(Z — P) = 0, and 
hence (Z — P)~! does not exist. Indeed, let e = (1,1,...,1), so the transpose e” is the 
corresponding column vector. Since the sum of the probabilities in each row of is one, 
Pe! =e, that is, one is an eigenvalue of P. This means that det(Z — P) = 0. 

In the next sections, we consider situations where (4.2.3) proves to be useful, but so far 
we have to consider K ,„ as is. 

EXAMPLE 4. We return to the situation of Example 3. Now, we find E{S2}, the ex- 
pected total number of claims during three periods (we count the initial period). In our 
case, ¢ = (Ag, A1,A2) = (2,4,8), and by (4.2.2), 


E{S2} =(e,m(I+P+P)), 


where To = (1,0,0), and * is given in (4.1.1). Calculations are easy, especially if you use 
software, and lead to E{S2} = 8.57. 
Calculations for larger n’s are similar and tractable with a good computer program. 


4.2.2 Mean discounted payments 


The notions of discount and present value we use here are introduced in Section 0.8.3. 
Briefly, the discount factor v; is the value of one unit of money to be paid at time ż if the 
evaluation is carried out from the standpoint of the present initial time zero. In the model 
below, we adopt the simplest representation 


Vt =v, 


where v is a given discount factor for a unit of time. We assume 0 < v < 1. 

Let us return to the model of Section 4.2.1 and formula (4.2.1). As before, we suppose 
that the expected payment vector at time ¢ is a constant vector c. However, now we are 
evaluating the present value of this future payment from the standpoint of the initial time. 
The present value mentioned is v’e, and in our model, we should set c; = v'e. Inserting it 
into (4.2.1), we have 


E{Sn}= "(Vem P) = E lev ToP) =F le tow P)). (4.2.4) 


272 4. RANDOM PROCESSES AND THEIR APPLICATIONS I 


The last quantity, the expected present value of the cash flow, is called in the actuar- 
ial literature an actuarial present value. The approach in (4.2.4) is sometimes called the 
“triple-product-summation approach (31-approach)” since we multiply in (v'e, %oP Ay the 
discount, the amount to be paid, and the probability that a particular state occurs, and then 
we add all products up. 

From (4.2.4), we have 

E{Sn} = (c,%o o PIN (4.2.5) 


In the previous section, we saw that one is an eigenvalue of P. We use now the fact that 
all other eigenvalues for the stochastic matrix P are less than one; see, e.g., [95, pp.11, 
141], [61]. 


(This fact may be derived, for example, from the following theorem. In the complex number 
space, consider the circles whose centers are the diagonal elements p;;, and the radiuses are the sums 
of the non-diagonal elements that is, }';: j+; pij, in the corresponding rows. Then all eigenvalues 
belong to the union of these circles; see, e.g., [61]. For our stochastic matrix, all points in these 
circles are less than one in the absolute value. Another proof is connected with the Perron-Frobenius 
theorem; see, e.g., [95, pp.11, 141].) 


The fact mentioned means that det(AZ — P) may be zero only for A < 1. Writing 

1 
det(AZ — P) = Adet(Z — P) = àdet(T — vP), where v = 1/A, we see that det(Z — vP) 
may be equal to zero only for v > 1. Hence, if 0 < v < 1, we can apply formula (4.2.3) with 


B =P, writing 
E{Sn} = (c, to(T — vP) (T — PY). (4.2.6) 


If n is not large, we should stop and use the above formula. However, if we consider the 
process in the long run (for large n), we may use the approximation 
Y oP > (Z-vP)" as n> œ, 


which implies that in this case, the discounted mean cash flow 
E{S,} > (c, to(Z— vP) !} as n—> o. (4.2.7) 


In Section 4.3.1, we consider another way of obtaining limiting relations of this type. 

In Exercise 41, we evaluate the present value of the payments in the situation of Example 
4.2.1-2 for a given discount factor. Next, to cover different situations, we shall consider a 
somewhat different example. 


EXAMPLE 1. An investor distinguishes five types of years: very bad, bad, moderate, 
good, and very good. The mean profit corresponding to the states mentioned is equal, in 
some units, to —3,—1,1,2,4, respectively. The discount factor is v = 0.97. 

Suppose that the change of investment conditions from year to year is well approximated 
by the homogeneous Markov model with the transition matrix 


01 07020 0 
02 02 060 0 
P=|| 0.05 0.1 0.7 0.1 0.05 |l. (4.2.8) 
0 01 0.1 06 02 
0 0 02 05 03 


4. Markov Chains 273 


The present condition of the market is moderate. Estimate the mean present value of the 
profit in the long run. 

Thus, To = (0,0,1,0,0), c = (—3,—1,1,2,4), and the estimate of E{S,} for large n, is 
(c, xo(Z — vP)-!) = 36.008, which may be easily computed using the matrix-commands 
MMULT and MINVERSE in Excel. In Exercise 39, the reader is suggested to consider the 
solution for different values of v. In Exercise 40, we consider this model for a finite n. 


4.2.3 The case of absorbing states 


A state i is called absorbing if pi = 1. 

For instance, in Example 4.1-2, state 1 (deceased) is absorbing. If in a life insurance 
contract, the situations when the insured dies and when she/he ceases paying premiums 
(and the contract is canceled) are considered separately, then we have two absorbing states. 

Consider a chain for which states i = 0,1,...,k, are non-absorbing, and the last r states 
are absorbing, that is, p; = 1 fori = k+ 1,...,k +r. Then, the transition matrix 


A B 
pels T | (4.2.9) 


where Ø is a matrix with zero elements, Z is a xr identity matrix, A is some (k +1) (k+ 
1)-matrix, and B is a (k+1)xr matrix. 

Assume that for each non-absorbing state, the probability of moving to some absorbing 
state is positive: for each i = 0,1,...,k, there is a state j from the set {k + 1,...,k +r} such 
that p;; > 0. This implies, in particular, that the sum of all probabilities in each row of the 
matrix A is less than one. 

Since at each step the process can move to an absorbing state with a positive probability, 
at some random but finite time T, the process will arrive at the absorbing state and never 
leave it (get stuck in it). Intuitively it is clear; we will show it rigorously in the end of this 
subsection. 

The matrix Z from (4.2.9) will not participate in calculations below. Let now the symbol 
T denote the (k + 1)x (k+ 1) identity matrix, that is, the matrix of the same size as 4. 

An essential circumstance in what we will do below is in the fact that det(Z — v4) 40 
not only for v < 1, but for v = 1 too. Therefore, though the model below involves a discount 
factor v, we may consider the case v = 1 also, which will allow us to explore, for example, 
the time of absorption or the situations when the r.v.’s Y; are not payments but, say, the 
numbers of claims. So, we use below the word “payment” only for certainty. 


(To show that det(Z — A) # 0, set b; equal to the sum of all elements in the ith row of B. We 
have assumed that b; < 1. Skipping the trivial case b; = 0, we divide the first row of B by b, the 
second row by b2, and so on. Denote the resulting matrix by K. Clearly, K is a stochastic matrix, 
and B = SK, where S is the diagonal matrix with b1, b2, ---, bg in the diagonal. Since all b; < 1 
and all eigenvalues of K are less than or equal to one, all eigenvalues of SK are less than one.) 


For the payment vector € = (co,...,Ck+r), We assume that c; = 0 for i > k (no pay- 
ments in absorbing states). Set ¢=(co,c,.-.,cx), the vector of “real” payments, and set 


T l ; t. 
To= (To, To1,---, Tog). Consider the expression (c, v'%oP ) in (4.2.5). 


274 4. RANDOM PROCESSES AND THEIR APPLICATIONS I 


: : : : t 
Since the last r coordinates in ¢ are zeros, the last rcoordinates of the vector toP do not 
oie ; í f 
matter. On the other hand, in view of (4.2.9), the first k + 1 coordinates of oP constitute 
~ t 
the vector 7-4 . Consequently, 


le, VP) = 6, VIA). 
Making use of (4.2.3), we have 


E{Sn} = Yo © VIDA) = €, TE vA) = (č, %(Z—vA) (T —(vA)"*})). 
(4.2.10) 
From this it follows that 


E{Sn} > (€,%o(Z—vA)~') asn>~., (4.2.11) 


In Section 4.3.1, we arrive at relations of this type in a different manner. 

Let us discuss the significance of the last relation. As was mentioned, with probability 
one, absorption will happen at some random but finite time T. After absorption, there are 
no payments, and accordingly, S„ will not change after the absorption time; that is, S, = Sr 
for all n > T. Then the limiting r.v. So = limy4.0. Sn is just Sy, that is, the total discounted 
cash flow until absorption. Formula (4.2.11) gives the expected total discounted cash flow, 
i.e., E{S;}. (Certainly, we may reason in this way if there is no restriction on the duration 
of the process.) 


EXAMPLE 1. Consider a medical insurance model with four states for an insured: 
healthy, sick, ceased paying, deceased. The transition matrix 


0.9 0.05 0.01 0.04 

0.1 0.8 0.01 0.09 
fe 0 0 1 0 

0 0 0 1 


(4.2.12) 


gives transition probabilities corresponding annual transition periods. For example, 0.1 is 
the probability that an insured being sick at the beginning of a year will recover by the 
beginning of the next year. 

Let the mean annual health care costs corresponding to the first two states—‘healthy’ and 
‘sick’—be equal to 1 and 4, respectively. That is, c = (1,4,0,0), and € = (1,4). Assume 
also that at the initial moment, 94% of clients of the company are healthy, that is, To = 
(0.94, 0.06,0,0), and To = (0.94, 0.06). Estimate the expected total costs for the discount 
v=0.97. 


We have 
AK | 0.9 0.05 | 


0.1 0.8 


T-A! x | 9.434 2.043 | 


4.085 5.349 


$ 


and 
To(T —vA)~! ~ (9.113, 2.241). 


Thus, by (4.2.11), 


E{Sn} > (€, %o(Z —vA)7!) ~ 1-9.113+4-2.241 = 18.077. 


4. Markov Chains 275 


It is interesting that from (4.2.11) with v = 1, we can immediately obtain the expected 
absorption time, that is, the expected number of steps until absorption. 

Indeed, let us set € = e =(1,1,...,1). Then, at each time until the moment of absorption, 
the payment equals one. Consequently, if a time moment n precedes the moment of absorp- 
tion, the variable S„ is equal to the number of periods elapsed. (We count time 0.) At the 
moment of absorption, there is no payment, and hence at this moment, S, is exactly equal 
to the moment of absorption. After absorption, there are no payments, and accordingly, Sn 
will not change after this point. Eventually, due to the choice of €, 


E{T} =E{S..} = (e, %o(Z—A)"'). 


To rewrite this in a more convenient form, we use two elementary facts: 

(1) if a vector pis a row-vector, then its transpose u7 is the same vector presented as a 
column; 

(2) (a,b) = ab’ for any row-vectors a, b of the same dimension. 


Let u be a vector such that u7” = (Z—A)~'e”. Then 
E{T} = To(T- A) e" = Top” = (To, y). (4.2.13) 


Note that, since E{T } is a finite number, we proved along the way that T is finite with 
probability one. 

Let To = (0, ...,0, 1,0, ...0), where 1 corresponds to state i. Then from (4.2.13) it follows 
that u;, the ith coordinate of y, is E{T | Xo = i}. 


EXAMPLE 2. For the data from Example 1 


saen as 1073 
ae -| 20/3 ni 


50/3 
T _ (T_ Y\-1eT — 
, and pw’ =(Z—-A) le Gua 
So, the expected lifetime for a sick person from the population under consideration con- 
stitutes 0.8 of the same time for a healthy person (the ratio of their expected lifetimes). If, 
as in Example 1, at the initial time, 94% of population are healthy, then % = (0.94, 0.06), 
and for a randomly chosen client, E{T } = (To, u) = 0.94- 2 +0.06 - 2 = 16.46. 


4.2.4 Variables defined on transitions 


In some situations, variables defined on a Markov chain—say, cash flows—may not be 
determined by the state in which the process is at a current moment, but rather by the 
transition the process has made. For example, in the case of life insurance, the company 
pays only one time when the transition “alive” —“deceased” occurs. 

For such a case, we replace the r.v.’s Yy; from above by r.v.’s Yqj assigning Yqj) to the 
transition i — j if it occurred during the period from time f — 1 to time t. For brevity, we 
interpret Y;(;;) below as a payment, though other interpretations are also possible. 

Assume that FE {Y,(;;)} = v'cij, where v is a discount factor, and the mean payment cj; 
does not depend on t. Let the matrix C = ||c;;||._ Given that the process at time t — 1 is 
in state i, the expected payment in one step, not involving discount, is equal to )’ ;cijpij. 
Denote this number by cp;, and set the vector ep = (c71,...,c-p,), Where k is the number of 
states of the chain. 


276 4. RANDOM PROCESSES AND THEIR APPLICATIONS I 


sik oa T rs 
[One may note that cp; is the (i,’)-diagonal element of the matrix CP , where P` is the 
transpose of P, that is, 


cp = diagonal CP’). 


This may be of help in providing a program for calculations. ] 
Thus, the expected value of the payment at time t is 


; -1 
È criP(X-1 =i) =} crei) = (cp,%-1) = (ee, tP}: 


Here and below t = 1,2,..., because there is no payment at time 0. 
So, if S, is the total discounted payment during n periods, then 


n n n 
E{S,} = E v lep, ToP’) = vÈ (cp, nov P) v(ep, To (X (vP) 5 1) . (4.2.14) 
t=1 t=1 t=1 


If v = 1 we should stop at this stage. This is the case when we do not take into account a 
discount, or we view Y,; not as payments but as the numbers of claims, for example. 
If v < 1, we can continue, writing 


E{S,} = v- (ep, No(Z—vP) | (LZ —(vP)")), (4.2.15) 


and 
E{S,} > v-(cep,%o(Z—vP)!) asn> o. (4.2.16) 


Thus, the only difference between representations (4.2.6) and (4.2.4) is that we replaced 
c by ce, and instead of (vP)"*! in (4.2.6), we wrote (vP)” in (4.2.15). 


EXAMPLE 1. Consider the situation of Example 4.2.3-1 with the transition matrix 
(4.2.12). Assume that we deal with a life insurance contract paying a unit of money (say, 
$100,000) upon the death of the insured. As in the example mentioned, we assume that 
at the initial moment, 94% of the population under consideration is healthy. So, for a 
randomly chosen client, To = (0.94, 0.06, 0,0). Let v = 0.97. 

A payment is provided only in the case of transitions 0 —> 3 or 1 — 3. Consequently, 
from the concrete representation (4.2.12) it follows that cp; = 1-0.04, cp2 = 1-0.09, and 
Cp3 = Cp4 = 0. So, Cp = (0.04, 0.09, 0,0). 

We can use the structure of P as we did in Example 4.2.3-1, but we do not need it. If 
we restrict ourselves to the approximation (4.2.16), we can easily compute v(ep,%o(Z — 
vP)~') directly by using, say, Excel or another software. In this case, it is convenient to 
keep in mind that a scalar product (a,b) may be written as ab’. 

The reader can verify that the answer to this problem is ~ 0.549. 


4.2.5 What to do if the chain is not homogeneous 


Non-homogeneous chains are actually handled similarly, although formulas will become 


a bit messy. We should replace everywhere P f by oP 0) which is the product of matrices 
defined in (4.1.5). 
In particular, it concerns (4.2.2) and the definition of the matrix K, there. 


4. Markov Chains 277 


In Section 4.2.3, the matrix A’ should be replaced by A” = 94.,4..4....-,1A, 
where +A is the matrix corresponding to the transition from time k to time k + 1. However, 
now we do not have a geometric series, so we can not proceed as in (4.2.11), and, hence, 
we are doomed to straightforward calculations. 


The same concerns (4.2.5). We should replace there (VP Jf by v’ oP o and leave (4.2.5) 
as it is, because now we cannot proceed as in (4.2.6). 

In Section 4.2.4, we should first redefine cp; as Cpi =} jCij* t-1Pij> since the transition 
probability depends now on time. The vector of payments will then be ¢p=(;—1C71,.--, +-1C ek): 


In (4.2.14), we should replace cp by €p and (VP)! by y! we”, and stop there. 
In this case, calculations will be straightforward and not tractable by hand, but a good 
program can do it easily. 


4.3 The first step analysis. An infinite horizon 


Here we consider a method proven to be efficient in many problems concerning global 
characteristics of Markov processes in the case of an infinite time horizon. In particular, as 
an example, we obtain again limiting relations from Sections 4.2.2 and 4.2.3. 

However, in order to illustrate the idea of the method, we start with an example that has 
no direct relevance to insurance. 


EXAMPLE 1 ({152, N25]*). A gambler begins with 2 chips. At each play he/she can (a) 
win 2 chips with probability 0.1; (b) win 1 chip with probability 0.2; (c) push (win 0 chips) 
with probability 0.3; (d) lose 1 chip with probability 0.3; (e) lose 2 chips with probability 
0.1. Play continues as long as the gambler has exactly 2 or 3 chips. Calculate the expected 
number of rounds the gambler has 3 chips. 

We remember that the gambler starts with 2 chips. Nevertheless, we introduce into con- 
sideration two r.v.’s: Nz the number of rounds the gambler has 3 chips if he/she begins with 
2 chips; and N3—the number of rounds the gambler has 3 chips if he/she begins with 3 
chips. Let X be the number of chips the gambler won at the first step. 

Rounds are independent. Therefore, if for example the gambler wins 1 chip at the first 
Step, we arrive at an identical situation with one exception: now the gambler has 3 chips. 
So, E{N2|X = 1} = E{N3}. Similarly, E{N3 |X = —1} = 1 +E{N2} because, if the gam- 
bler starts with 3 chips, we count the first round, and if he/she loses 1 chip at the first play, 
he/she starts over with 2 chips. 

We have 


E{N>} = E{N2|X=—2, or —1, or 2}P(X=—2, or — 1, or 2) + E{Np|X=1}P(X=1) 
+ E{No|X = 0}P(X =0) =0+E{N3}-0.2+E{Np}-0.3. 


Thus, 7E {Nz} = 2E{N3}. 
Similarly, 
E{N3} = E{N3|X=—2, or 1, or 2}P(X=-—2, or 1, or 2) + E{N3|X=—1}P(X=-1) 
+ E{N,|X=0}P(X=0) = 1-0.4+ (1 +E{N2}) -0.3 + (1+ E{N3}) -0.3, 


Reprinted with permission of the Casualty Actuarial Society. 


278 4. RANDOM PROCESSES AND THEIR APPLICATIONS I 


which leads to 7E{N3} = 10+ 3E{N2}. Together with the previous relation, it gives 
E{N2} = 2 © 0.465. 


4.3.1 Mean discounted payments in the case of infinite time horizon 


Consider the model of Section 4.2.2. Before, we let the finite number of steps n converge 
to infinity in the end of calculations. Now, we assume that the time horizon n is infinite 
from the start. In other words, we consider the r.v. 


So = Lo = ar Vix, 


supposing that this infinite series converges. Under assumptions we make below, this will 
be the case. 

We interpret r.v.’s Y; as random payments and, as in Section 4.2.2, set c = E{Y,;} and 
Cc = (Cit, Cat, a): 

In Section 4.2.2, we obtained the representation (4.2.2) as a corollary from some more 
general result. Our goal here is to show that, if we are interested only in E{S..}, we can 
calculate it by a shorter and more explicit method. 

Assume that c = v’c, where ¢ = (cj,C2,...) is a given payment vector not depending 
on time, and v is a discount factor. Set u; = E{S..|Xo = i}, the expected total discounted 
payment over the infinite time interval in the case when the initial state is i. 

Making use of the formula for total expectation (0.7.2.1), we write 


ui = E{S.|Xo = i} = VY E{S..|X1 = j, Xo =i} P(X = j|Xo = i), (4.3.1) 
j 


where summation is over all possible states j at which the process can arrive at the first 
step. 

Consider the case when at the first step the process moves from a state i to a state j. Since 
the process is Markovian, once it has arrived at state j, its future evolution depends only 
on where it is now and not on where it was at the initial time. So, at time t = 1, we can 
consider the process as a process starting here with the initial state j. 

At the moment ¢ = 0, the discount equals v? = 1. So, the total discounted payment is 
equal to the payment c; made at time t = 0 plus the total discounted payment made at time 
t = l and at all time moments after t = 1. The only thing we should realize is that it would 
be a mistake to write for the second part mentioned E{S..|Xo = j} = uj. It would have 
been correct if we had evaluated this part of payments from the standpoint of time t = 1. 
But we are at time t = 0, which means that for us the expected present value of this part is 
not u; but rather vuj. 

Eventually, E{S..|X, = j, Xo =i} = ci + vuj, and from (4.3.1) it follows that for each 
i=0,1,... 


i = X (c+ vey) py = ci} py tv) upi =i tv) pijuj, (4.3.2) 
j j j j 
because },; pij = 1 for all i. 
To find u;’s, we should solve the system of equations in (4.3.2). It is easier to visualize it 
if we use the matrix notation. 


4. Markov Chains 279 


Let vector u = (uo, 41,...). As was noted repeatedly, if the symbol T stands for the 
transpose operation, then y” is the same vector u viewed as a column vector. Observe that 
the r.-h.s. of (4.3.2) is the ith coordinate of the vector e? + vP. uT. Then the system of 
equations (4.3.2) may be written in the compact form 


y =I +v Pp". (4.3.3) 


IfO <v < 1, the matrix (T —vP)~! exists, and we have 
u = (T-P). (4.3.4) 


It is worthwhile to emphasize that the main idea of the derivation of (4.3.4) consisted in 
conditioning with respect to what can happen at the first step. Note also that in our calcu- 
lations, we did not suppose that the chain was finite. 

Next, we show that (4.3.4) implies what we obtained in Section 4.2.2. Indeed, 


E{S.o} = VE {Se | Xo = i}P(Xo = i) = È iTo; = (To, H). 


For any row-vectors of the same dimension a,b, their scalar product (a,b) = ab”. Thus, 
by virtue of (4.3.4), 


E{S.o} = (To, y) = Tou” = To(T - vP) 'e" = (no(Z—vP)~', ¢), (4.3.5) 
which coincides with (4.2.7). 


EXAMPLE 1. Let us return to Example 4.2.2-1. Substituting the data from this example 
into (4.3.4) and using software, it is easy to get u ~ (29.473, 31.854, 36.008, 40.620). The 
third number coincides with the answer in Example 4.2.2-1. We see also how the expected 
present value is increasing with the change of the initial state. The tendency is not surprising 
since the higher the number of a state, the “better” the state is. 


In Exercise 43, we derive (4.2.11) and (4.2.13) either from (4.3.5) or directly, making use 
of the first step approach. 


4.3.2 The first step approach to random walk problems 


4.3.2.1 The probability of returning to zero. Consider the simple random walk model 
(Example 4.1-4) viewing, for certainty, X, as a surplus process. Assume p > 1/2. Then at 
each step, the expected profit E {&,} = 2p — 1 > 0, so the process has a tendency to “move 
up” on the average. Let the initial surplus Xp = 0. 

Denote by T the time, if any, of the first return to zero. Formally, T = min{t > 1; X; = 0}. 
The probability we are computing is P(T < œ). The probability of the complement event, 
that is, P(T = œ) = 1 — P(T < œ) is the probability that the process will never revisit zero. 
The first step approach leads to 


P(T <0) = P(T <|X, =1)p+P(T <|X,; =—1)(1—p). 


Since X; = Yi_, & and E{E} = 2p — 1 > 0, we have E{X;} = t(2p — 1) 4 wast — o. 
Then, by the law of large numbers, the process starting from (—1) will cross zero level with 


280 4. RANDOM PROCESSES AND THEIR APPLICATIONS I 


probability one. Hence, P(T < œ|X;ı = —1) = 1, and setting s = P(T < «|X; = 1), we 
have 
P(T <0) =sp+1-—p. (4.3.6) 


Applying the same approach to P(T < |X; = 1), we write 


S = P(T < oo | X1 = 1,62 = 1)p+P(T < oo | X1 = 1,62 = —1)(1 —p) 
“PP Sik = 2) pep: (4.3.7) 


(When considering the first term, we have used the Markov property; when considering the 
second term—the fact that the condition {X; = 1,& = —1} implies that the process has 
already returned to zero.) 

Now, P(T < œ| X = 2) is the probability that the process will ever arrive at zero starting 
from state 2. This can be computed as the product of two probabilities: that the process 
will ever arrive at state 1 starting from state 2, and that the process will ever arrive at zero 
starting from state 1. The second probability was denoted by s, and the first equals s because 
the probability to visit state 1 starting from state 2 is equal to the probability to visit state 0 
starting from state 1. The last assertion is true in view of the Markov property and the fact 
that the probabilities of moving up and down are the same at each step. 

Thus, P(T <00|X> = 2) = s?, and in view of (4.3.7), 


s=s’p+l1—p. 


The last equation has two roots: sı = 1, s2 = (1 — p)/p. Let us recall that E{E;} > 0. In 
this case, from a heuristic point of view, it seems very plausible that starting from state 2 
the process with some positive probability will never return to 1, and hence will never come 
to zero. In other words, we conjecture that s < 1. A not too long proof of this fact requires, 
however, some additional theory, and we postpone it to Example 4.5.2-1. So, we choose 
s = (1 — p)/p. Together with (4.3.6) it leads to 


P(T < œ) =2(1—p). 


Note that when deriving the last formula, we assumed that p > 1/2. Similarly, if p < 1/2, 
we have P(T < œ) = 2p. For p = 1/2, both formulas lead to P(T < œ) = 1, which is 
indeed true, as will be shown in Example 4.5.2-1. 

Let again p > 1/2. Then 


P(T =~) =1—P(T <%)=2p-1>0. 


We saw that, if at the first step the profit becomes negative, the process will cross zero level 
with probability one. Consequently, P(T = œ) is the probability that at the first step the 
profit will be positive, and the surplus X, will never reach zero again. 


4.3.2.2 The ruin problem. Viewing again X; as a capital (or the surplus of a risk portfo- 
lio), we fix an integer a, and set Xp equal to some u € [0,a]. Assume that the process comes 
to a stop when it either reaches zero level (which we call ruin) or the level a, whichever 
comes first. For example, an investor or a player having an initial capital of u plays until 


4. Markov Chains 281 


1 2 3 4 5 6 O 1 2 3 4 5 6 


0 
The case of ruin The case when the process 
first reaches the level a 
FIGURE 8. 


the first moment when she/he either runs out of money or gets a planned amount a. See 
also Fig.8. 

In the case of a risk portfolio, we can interpret a as a level after which the company stops 
accumulating the surplus and either invests a part of it or, for example, pays dividends. 

Let A, be the event that the process reaches zero level, and it happens before the process 
hits the level a. Set qu = P(A,). The index u indicates that this probability depends on the 
initial level u. 

Let B, be the event that the process reaches the a-level, and it happens before the process 
hits zero level. Set p, = P(B,). 

One can guess that Pu + qu = 1, which is true but should be proved. Theoretically, it 
may happen that the process will never reach either of the boundaries of the interval [0,a] 
(“corridor’” [0,a] in Fig.8). We will show later that the probability of such an event is zero. 

To find qu, first note that 

go=1, da = 0. (4.3.8) 


(If u = 0, the player is already ruined; if u = a, the player already has the amount a.) 
For u = 1,2,...,a— 1 


qu = P(Au|§1 = 1)p + P(Au|§1 = —1)(1—p). (4.3.9) 


If at the first step the process moves up, the random walk starts over but from the level 
u+ 1, and the probability to be ruined under this condition becomes qu+1. More rigorously, 
P(Au|§1 = 1) = qui. 

Similarly, P(A, |61 = —1) = qu—1, and we rewrite (4.3.9) as 


du = PQu+i+(1— p)qu-1- (4.3.10) 


This is a difference equation. Without going deep into the theory of these equations, we 
note that if qP and qP are two particular independent solutions, then any solution will be 


a linear combination of these two, that is, 


(1) (2) 


qu = C1qu +C2qù , where c1,c2 are some constants. (4.3.11) 


In the case p = 1/2, equation (4.3.10) may be written as 


1 
qu = 5 (Gut +qu-1), 


282 4. RANDOM PROCESSES AND THEIR APPLICATIONS I 
and as particular solutions we can choose qP = 1 and qP = u (the reader can verify this 
by direct substitution). Then, by (4.3.11), any solution has the form 


qu = c1 + cau. (4.3.12) 


To specify the constants, we use the initial condition (4.3.8), which gives that 1 = qo = c}, 
and 0 = qa = c1 +c2a. Together with (4.3.12), it leads to 


(4.3.13) 


To find pu, it suffices to observe that the distance from the initial point u to the level a, 
is a — u, and since the walk is symmetric, the probability of hitting a first, starting from u, 
equals the probability of hitting 0 first, starting from a — u. In other words, py = a-u =u/a. 

We see that in the symmetric case the answer is simple: the probability of not being 
ruined is proportional to the initial capital. We see also that p, + qu is indeed equal to 
1, that is, the probability that the process will never reach the boundaries of the corridor 
mentioned is zero. 

Consider the case p # 1/2. If p =0, obviously qu = 1, so we exclude this case. The 


function qP = | still satisfies (4.3.10), but qu, = u does not. Direct substitution shows that 


now we can take qP = r", where r = (1 — p)/p. The general solution is given in (4.3.11). 
To find constants cı and c2, we again use (4.3.8), which leads to 1 = gg = cı + c2, and 


0 = qa = c1 +c2r°. Together with (4.3.11), it implies that 


r" — r* 


— Jaret’ 


qu (4.3.14) 


To find p„, we follow the same logic as before, replacing in q, the argument u by a — u. 
However, since now the walk is not symmetric, we should also replace p by 1 — p. If we 
do that in formula (4.3.14), the new r becomes p/(1— p) = 1/r. Thus, 


BE eo aes Co ae ce 
1—(1/r)2 re — 1 


(4.3.15) 


We see that again pu +4, = 1. 

The analysis of the ruin probability formula (4.3.14) leads to some interesting corollar- 
ies. 

Following for a while the game interpretation, assume that the stake at each play is re- 
duced in half. For example, a player decides to bet not $1 but 50¢ at each play. How does 
this change the ruin probability? 

If we adopt the new stake as a unit of money, the initial capital in this new unit will be 
equal to 2u, and the upper level—to 2a. In the symmetric case, the ruin probability (4.3.13) 
will not change after such a substitution, but in the case p 4 1/2, the new probability will 
be equal to 


* 


r” ra pr r pr 
lu = = = 


1-P4 J4 l-r 14r4 
We see that if p < 1/2 (and hence r > 1), then q} > qu; while if p > 1/2 (and hence 
r <1), then qf < qu. 


qu- 


4. Markov Chains 283 


Thus, if a game is not favorable for a player (p < 1/2), she/he will reach the upper 
threshold with a larger probability by playing higher stakes. 


EXAMPLE 1. Let u=$9, the planned level a=$10, and p=0.4. If the stake is $1, that is, 
our player is within one step of reaching the level a, then though the game is not favorable 
for the player, the probability to reach $10 is pp=(1—(3/2)?)/(1—(3/2)!°) = 0.660. (Note 
that this probability is larger than 0.4—the probability to “move up”. The reader is invited 
to explain this from a heuristic point of view.) 

On the other hand, if the stake had been only 10¢, the new probability of reaching $10 
would have been (1—(3/2)”°) /(1—(3/2)!™) ~ 0.0173. 


In the case a = œ, the process cannot reach the level a. So, in this case we consider only 
the possibility of being ruined, and hence q,, is the probability of ever being ruined during 
an infinite interval of time. 

Letting a > œ in (4.3.13) and (4.3.14), we have qu > 1 forr > 1, and qu > r’ ifr <1. 
In other words, for a = œ, 


qu=1 if p< 1/2, andq, =r", if p > 1/2. (4.3.16) 


4.3.2.3 » Infinitesimal increments. The next approximation is useful for future mod- 
eling. It applies to the situations where the capital is changing not abruptly but in small 
increments during small intervals of time. For example, in the case of insurance, we may 
think that we observe capital each hour. 

Denote the length of the time interval between consecutive steps by 6, measured in some 
real units of time. We will view ô as very small; at the end of constructing the model, we 
will let 6 — 0. If 6 is small, it is natural to assume that the change of the capital at each 
epoch ż is also small. Denote this change, measured in some original units of money, by nz. 
Assume that n; = +k for some k with probabilities p and 1 — p, respectively. Later we will 
let k > 0. 

It is worthwhile to emphasize that we should not confuse n; with & above, since € = 
+1 in some conditional not specified yet unit of money. 

To find natural representations for k and p, assume that the mean and the variance of 
nr are proportional to the length of the time period. More precisely, assume that E{n;} = 
ud, Var{n,} = 075 for some u and ©. Since, on the other hand, E{n,} = k(2p — 1), and 
Var{n:} = 4k? p(1 — p), we have 


k(2p—1) =p8, 4k? p(1— p) = 078. 


Simple algebra leads to 


k=ov8+0(V8), p= (+E va) +o(v8). 


1 
2 
So, the case u = 0 corresponds to p = 1/2. 


This section may be skipped in the first reading. 


284 4. RANDOM PROCESSES AND THEIR APPLICATIONS I 


Now, we fit all of this to the model of the previous subsection. The initial capital u and 
the level a are measured in the original units of money. We adopt k as a new unit of money, 
and replace n; by &, taking values +1. Accordingly, we replace u by uw = u/k, and a by 
a =a/k. Furthermore, setting s = V5, we have 


pa tap IG gs) +068) 1 85+066) att ayy 
p $(1+ 4s) +0(s) 1+ §5+0(s) o 
u= 2 E = z to(1)), a= Pi s! Fell). 


For the ruin probability gz, we again use (4.3.13) if u = 0 (which corresponds to p = 1/2), 
and use (4.3.14) if u 4 0 (which corresponds to p 4 1/2). For the former case, 


qi = — = (1+0(1)) je Sass 3-0), 
a a 


Thus, in the symmetric case, the result is the same formula (4.3.13). 
The case u Æ 0 is much more interesting. We write 


r= (1 —2%s-+o(s))/* =(1 25s +0(s)) (+0) — exp{—2up/o2} 


as s — 0. To get r“, we should just replace u by a in the last formula. 
Letting s — 0, we eventually obtain that the ruin probability 


ga > [exp{—2uu/o?} — exp{—2qy/0"}]/[1—exp{—2au/o"}]. 4.3.17) 


For u > 0, letting a — œ, we get that the probability of ever being ruined during an 
infinite interval of time is 


lim qu = exp{—2up/o7}. (4.3.18) 
sa 


We will come to this formula again when considering the ruin problem in the case of 
Brownian motion; see Section 5.2.4.4. 


EXAMPLE 1. Assume that the surplus for a risk portfolio grows by u = 1 units of 
money per day on the average, and the standard deviation of the change of the surplus is 
© = 2 per day. In this situation, one may guess that the probability that the change per day 
will be negative is not small. Let the initial surplus be u = 10 units. Then the probability 
that the surplus will run out completely in an infinite time period, may be estimated by 
exp{—2up/o7} = exp{—5/2} ~ 0.082. This is relatively high probability to be ruined, so 
it may make sense to start with a bigger amount of u. O < 


4.4 Limiting probabilities and stationary distributions 


This section concerns homogeneous chains. Consider, as an example a particular transi- 
tion matrix of a chain with two states 
0.8 0.2 


’ 


0.3 0.7 


4. Markov Chains 285 


and its consecutive powers 
0.45 0.55 a 


p 0.60625 0.39375 
~ |10.590625 0.409375 


0.525 0.475 


Pp! ~ 0.6004 0.3996 
aks ~ 110.5994 0.4006 


Pala 0.3 | 


pP he 0.35 | 


ules (44.1) 


We see an explicit convergency pattern, and what is important is that the two rows are 
getting closer to each other. This is not accidental. Consider an arbitrary two-dimensional 
transition matrix which may be written as 


lt 


where &, B are non-negative and are not greater than one. 
Let a + B > 0; that is, either &, or B, or both are positive. It is known that in this case, 


(=a py 
a+Bp 


TE. ie a —a 
+B BR a -P P 
The reader may prove it by induction on her/his own or look at a proof, e.g., in [61] or [79]. 

Assume, in addition, &+ B < 2; that is, at least one number is not 1. Thus, 0 < a+ <2, 
and hence —1 <a+fB—1< 1. So, |1 —-a—B| < 1, and the second term in (4.4.2) converges 
to zero as t — co. Thus, 


(4.4.2) 


1 
Poles =| mi| as re (4.43) 
where 
Q 
To 


= —, T = ——. 
a+ i a+ß 


Note that the convergence is fast: (1 — a — 8B)’ — 0 exponentially. 

In the example above, œ = 0.2, B = 0.3, and To = a = 0.6, T = rerun = 0.4, 
which is consistent with (4.4.1). 

What is remarkable in (4.4.3) is that the rows in the limiting matrix are identical. This 
means that, in the long run, asymptotically, the probability that the process will be in a 
particular state does not depend from which state the process has started at the initial time. 
The process is, so to say, “gradually forgetting” the past; look again at (4.4.1). 

Such a property (with some variations in definitions) is called ergodicity, and the chain 
itself—ergodic. 

We have established it for 0 < a+ < 2. If this is not true, ergodicity does not take 
place. Let & +ß = 0. Then, since & and B are non-negative, both numbers are zeros, and 


P=T= | 7 1i the identity matrix. In this case, P j= T for all t, and the process will 
never leave the initial state. 
If «+8 =2, which is possible only if both numbers, & and B, equal one, then P = ; à , 


and the process alternates between two states. (Starting from state 0, the chain moves to 


286 4. RANDOM PROCESSES AND THEIR APPLICATIONS I 


state 1, and then comes back to state 0, and keeps moving in the same fashion.) Excepting 
the cases 


or 3 (4.4.4) 


01 


10 
pE 10 


i 


the chain is ergodic. 
Certainly, the fact that we considered only two states does not matter. Consider, for 
instance, Examples 4.1-3 and 5, concerning a process of rearrangements. We have seen 


already P ? in this case. The reader may compute that 


0.169 0.165 0.169 0.165 0.169 0.165 
0.165 0.169 0.169 0.165 0.169 0.165 
Px 0.169 0.165 0.169 0.165 0.165 0.169 

0.169 0.165 0.165 0.169 0.165 0.169 
0.165 0.169 0.165 0.169 0.169 0.165 
0.165 0.169 0.165 0.169 0.165 0.169 


After just five steps, all numbers are very close to 1/6. We will see soon that 1/6 is the 
limiting probability. So, rather quickly, all rearrangements are getting close to be equally 
likely. 

Consider the shuffling interpretation. We may define a shuffling to be perfect if it leads 
to equal probabilities of all possible permutations. We see that even a simple shuffling as 
in our example is asymptotically perfect. 

In general, ergodicity takes place under some conditions, though as we will see, they are 
rather mild. For example, chains with the transition matrices 


2-[ e 


ai (4.4.5) 


01 
10 


ty t ; 
do not posses the ergodicity property. In the first case, P = Z, and the process will never 
leave the starting state. In the second case, the process continues to alternate between two 
states. 


Let us consider a rigorous statement. The result below concerns finite chains and may be 
(m) 


applied to many practical problems. Recall that for simplicity, we write pọ; instead of p;; ', 


understanding it not as a power of p;; but as the corresponding element of P". 


Theorem 4 Let the number of states k < œ. Suppose that for some state jọ and a 
natural m 


Piw > 0 forall i. (4.4.6) 


Then there exists a probability row-vector T = (T1,..., Tk), such that Ti > O for all i, 
Ti +... +T = 1, and 


P|: | as t>, (4.4.7) 


where in the limiting matrix each row is equal to the vector T. 


4. Markov Chains 287 


A proof may be found, e.g., in [35] or [120]. 

Condition (4.4.6) is the finite-chain version of the Doeblin condition. It supposes the 
existence of a state to which the process can move with positive probability from any state 
in a finite number of steps. One can say that this state is accessible from any state. 

Algebraically it means that for some m, the matrix P" has at least one strictly positive 
column. Certainly, if all elements of are positive, Doeblin’s condition holds automat- 
ically with m = 1. For instance, this is the case for the matrix (4.1.1) in the example 
concerning changing intensities of a claim flow. 

In the rearrangements problem of Examples 1.1-3 and 5, in itself there is no strictly 
positive column. However, all elements of P 3 are positive, so Doeblin’s condition holds 
for m= 2. 

Conditions of Doeblin’s type and corresponding theorems for the general case, when 
chains may be infinite, can be found, e.g., in [35], [36], [120]. 

Consider now T, the distribution of the process at time t. 


Corollary 5 Under the conditions of Theorem 4, 
Tl, as t >>, (4.4.8) 
where T is the same as in (4.4.7). 


Proof. We have 
T 


t 
T, = ToP — To: 
T 


All elements in the ith column of the last matrix equal 7;. Hence, the ith element of the 
product is Toon; + ToT; +... = Ti (Too + Noi +...) = T; 1 = T. W 


Thus, whatever the initial distribution To is, asymptotically, as t — ©, the distribution of 
the process at time ¢ converges to the same distribution 7. In particular, this implies that 


In the long run, the proportion of time when the process is in state i, 
is equal to 7;, the ith coordinate of the vector 7. 


For more detail, see also Section 4.5.4. 
The next question seeks to find the limiting distribution %. From (4.1.8) it follows that 


T: =m, P (4.4.9) 


(consider one step, viewing t — 1 as the initial time). Since 1, > 1, and 1,_; — T, as well, 
letting t — œ% in (4.4.9) , we have the fundamental equation for T: 


n=. (4.4.10) 


288 4. RANDOM PROCESSES AND THEIR APPLICATIONS I 


The probability distribution % satisfying (4.4.10) is called a stationary distribution. The 
choice of the term is connected with the following fact. 

Let To = T, that is, the process starts from the distribution mŒ from the very beginning. 
Then 2; = to P = TP =2, m = nı P = TP = 1, and so on: T, = T for all t. 

Thus, for an arbitrary initial distribution Tọ, the distribution 7, is approaching the station- 
ary distribution m, while if the initial distribution is % itself, the distribution of the process 
is invariant through time and corresponds to the stationary regime from the very beginning. 

Nowadays, solving equation (4.4.10), at least numerically, is not a problem even for large 
dimension. We restrict ourselves to the following illustrative examples. 


EXAMPLE 1. Consider the process of random rearrangements with P given in (4.1.3). 


It is straightforward to verify that for such a matrix, equation (4.4.10) is true for m=(+ 


Tc) 
6°6°6°6? 
i é)s meaning that, as t — 9, all arrangements are equally likely. 
EXAMPLE 2. Consider the process of changing intensities with matrix (4.1.1). Equation 


(4.4.10) may be written coordinatewise as follows: 


0.6%) + 0.47; + 0.2522 = To, 
0.3% + 0.52, + 0.7m = T1, 
0.1% +0.1m) + 0.05%. = T2. 


Together with To +7, +72 = 1, it yields 


My = m=, ee cs (4.4.11) 
We can interpret this as for an infinite time interval, the proportion of days, for instance, 
with icy conditions is M2 = 2/21. 

Let us continue our example adding the information from Example 4.2.1-3. Let Yo, Y1, Y2 
be independent Poisson r.v.’s with parameters Ay = 2, A, = 4, Az = 8, respectively. Since 
for large t, the probability to be in state i may be approximated by 7, the r.v. Z;, the number 
of claims received on day t, may be approximated by a r.v. 


Yo with probability Tto, 
Z = 4 Yı with probability 71, 
Y with probability 72. 


In particular, for large t, or more rigorously, as t > , 
E{Z,} —> E{Z} = Too +11 Ay + T242 = 3.4166... . (4.4.12) 


We can also estimate the expected total number of claims, S,,, for n days. Let m = Toño + 
TA, +TMoAg. Since E{S,} = Ero E{Z,}, in view of (4.4.12), 


E{S,} ~mn = 3.4166n. 


For example, for a season of n = 60 days, the estimate is 3.4166... -60 = 205. Omitting a 
proof, note that the last estimate is pretty good for n = 60, since the rate of convergence in 
(4.4.12) is exponential, which is fairly rapid. 


4. Markov Chains 289 


If we want to approximate P(Z,<x), we may, as in Example 4.2.1-3, just write P(Z;<x) ~ 
To - PoissonDist (x, 2) + T; - PoissonDist (x,4) + T2 - PoissonDist (x, 8). 

Computing the distribution of S,, is more complicated. An approximation can be provided 
by making use of the CLT for Markov chains (see, e.g., [63], [120]), but we skip this 
question. 


EXAMPLE 3. Let a chain have one absorbing state, say, 


x k OK 


P=|x*x], (4.4.13) 


where the stars x represent positive numbers. It is easy to see that in this case the solution 
to (4.4.10) is m = (0,0, 1), that is, the limiting distribution is concentrated at the last state. 
It is not surprising at all, since with probability one the process must arrive at the absorbing 
state. For us, it is worth noting, however, that Theorem 4 covers such cases also. 


EXAMPLE 4 ([151, N1615). Drivers transition monthly between three states, “good”, 
“average”, “bad”, according to the transition matrix 


0.8 0.2 0.0 
P = || 0.2 0.6 0.2 
0.0 0.4 0.6 


What percentage of drivers will transit from “good” to “average” between month 100 and 
month 101? 

The number 100 is “large”, so we may consider the limiting probability mo of being in 
the state “good”. Solving (4.4.10) similarly to what we did in Example 2, we find that 
To = 0.4. Thus, on the average, 40% of drivers will be in the state “good” in month 100. 
Since po; = 0.2, from these 40%, on the average, 20% will transit to the second state 
(“average”). Thus, the solution is 0.4-0.2 = 0.08 or 8%. 


Route 2 => page 303 


4.5 The ergodicity property and classification of states 


We turn to a more detailed analysis of possible states, which will allow us to better 
understand the nature of ergodicity. We consider below only homogeneous chains. 


4.5.1 Classes of states 


A state j is said to be accessible from a state i if p; > 0 for some m. 

In Example 4.1-3, the state (1,3,2) is not accessible from (1,2,3) in one step, but is ac- 
cessible in two steps. So, we say that the former state is accessible from the latter (and, 
certainly, vice versa). 


Reprinted with permission of the Casualty Actuarial Society. 


290 4. RANDOM PROCESSES AND THEIR APPLICATIONS I 


States i and j are said to communicate if they are accessible from each other. 

In Example 4.1-2, state 1 is accessible from state 0, but not vice versa. In the random 
walk from Example 4.1-4, all states communicate if p #0 or 1. Indeed, if j > i, we can 
move from i straight up along the path i> i+ 1 —> i +2 — ...— j, whose probability is 
pi~ > 0. The case j <i is considered similarly. 

It may be shown that states of any chain can be partitioned into disjoint classes such that 
all states from the same class communicate, and any two states from different classes do 
not (since otherwise they would belong to the same class). 

For example, a homogeneous chain with states labeled 0,1,2,3 and 


05 05 0 0 
05 05 0 0 

P= 0.25 0.25 0.25 0.25 (4.5.1) 
0 0 0 1 


has three classes: {0,1}, {2}, {3}. 

The chain is said to be irreducible if it has only one class. 

For instance, for the chains from Examples 4.1-1, 3 and 4, all states communicate, and 
hence these chains are irreducible. (In Example 4.1-3 for p £0, 1.) 


4.5.2 The recurrence property 


Let f; be the probability that, starting from state i, the chain will ever return to this state. 
State i is called recurrent if f; = 1, and transient if f; < 1. 

For a recurrent state i, the process, starting from i, will return to i with probability one. 
Since this process is Markov, once it revisits i, the process will start over from i as from 
the beginning. After this, the process will again return to i with probability one, and so on, 
revisiting state i infinitely often with probability one. W. Feller, the author of one of best, 
if not the best, book on Probability Theory [38], called such a state persistent and regretted 
that in the first edition of his book he had called it recurrent. However, the term recurrent 
is widely accepted. 

Denote the number of revisits to a state i given that the process has started from this state 
by M;. We do not count the initial stay in i. 


We saw that 
If i is recurrent, then M; = œ with probability one. i 


Let f; < 1. Then the probability that the process will never return to i is P(M; = 0) = 
1 — fi. The probability that the process will revisit i one time and then never come back is 
P(M; = 1) = f;(1 — fi), and in general, P(M; = k) = f*(1—f;). Thus, M; has a geometric 
distribution and, in particular, 


E{M;} = fi/(— fi) 
(see (0.3.1.12). Thus, 


If i is transient, then M; < œ with probability one. Moreover, E{M;} < ©. (4.5.2) 


4. Markov Chains 291 


From (4.5.2) it immediately follows that 


For any finite Markov chain, there exists at least one recurrent state. (4.5.3) 


To show this, assume that all states are transient. Then, since all transient states may be 
visited only a finite number of moments of time, and the number of states is finite, after a 
finite number of steps no states will be visited, which is impossible. 

Before turning to examples, consider two more facts. 

Let Xo = i (the process starts from i), and let the indicator r.v. I, = 1 or 0, depending on 
whether X, = i or not. Then Mj = Yr Tn. 

On the other hand, P(/, = 1) = P(X, =i) = p}, and hence 


E{M} SE hh neh P=). ph. (4.5.4) 


Thus, 


State i is recurrent iff ae pi; =. (4.5.5) 


We will derive from this the following proposition. 


Proposition 6 [f states i and j communicate, then i and j are either both recurrent or 
both transient. 


In other words, 


The recurrence and transience properties are properties of classes. (4.5.6) 


Proof of Proposition 6. If i and j communicate, by definition, there exist mı and mz such 
that Pii > a and Pi; > B for some a, B > 0. One of the possible paths to move from j to j 
in mı +n +m, steps is to move in m steps from j to i, to return to i in n steps, and to move 
from i to j in m steps. Hence, 


n+ 
pi! n+my > Pi Pabi Zy api. (4.5.7) 
Consequently, if i and j communicate, then )’;_ pj; and Xp- p}; converge or diverge si- 
multaneously. 
[In more detail, since 


< my+n+m __ ae n 
me P ij ~~ Len=m,+m2+1 Pjj> 
oo mı +n+m 


series Yn P jj and Yr, p'j; converge or diverge simultaneously. Consequently, if 
Yn=1 Pj; converges, then Xp- pi; converges; if Xp- pi; diverges, then Xp- p}; diverges. 
Since iand jare arbitrary, we can switch i and j, and apply a similar argument.] W 


292 4. RANDOM PROCESSES AND THEIR APPLICATIONS I 


From (4.5.3) and (4.5.6) it follows that 


For any irreducible finite Markov chain, all states are recurrent. 


For instance, the chains from Examples 4.1-1 and 3 are irreducible, and hence all states 
are recurrent. 

If a finite chain has several classes, for each class we should check whether the process 
may leave the class and never come back with a positive probability. For finite chains, it is 
usually easy. 

For instance, in Example 4.1-2, state 0 (alive) is transient, and other states are recurrent, 
moreover, absorbing. 

For a chain with P from (4.5.1), classes {0,1},and {3} contain recurrent states, and 
class {2}— transient. A chain with 


0.5 0.5 0 0 

0.5 0.5 0 0 
TS 0 0 0.75 0.25 

0 0 03 07 


clearly has two classes but here all states are recurrent. 
If a chain is infinite, the question may not be so simple. 


EXAMPLE 1 is classical and concerns the simple random walk; see Example 4.1-4. Let 
p+#0,1. Then, as has been shown, the chain is irreducible, and in view of (4.5.6), it suffices 
to check only one state, say, 0. Clearly, the process can return to 0 only in an even number 
of steps. Hence, posel = 0. Suppose Xə = 0. Then the number of &’s taking the value 1 


should be equal to the number of &’s taking the value —1. Therefore, 


2k (2k)! 
2k k k k k 
= 1 = 1 . 
Poo (J) on pp PP) 


We apply Stirling’s formula 


k! ~ V2nkkke*, 


where, as usual, ag ~ bg means (ag/bg) — 1. (A proof may be found in many books on 
advanced calculus. Neat and less traditional proofs are contained, e.g., in [38, II.9] and in 
[116, 4.3].) 

The reader can easily verify that, by Stirling’s formula, 


Poo ~ salt — p). 


Let ap = 4p(1 — p). If p Æ 4, then a, < 1 (recall what is the maximum of p(1—p)). Then 


at. is an exponential function in k. Hence, 


a 


Sine 


yy Poo ~ Ł Poo a 
n=1 


k=1 k 


4. Markov Chains 293 


Consequently, by criterion (4.5.5), state O is transient. 

Thus, if p Æ 5 starting from any state, the process will never come back with positive 
probability. This should not seem surprising. If p > 1/2, then E{E} = 2p — 1 > 0, and 
E{X;} =u+(2p—1)t > œ% as t => œ. So, by the law of large numbers, X, — +œ with 
probability one, and may be in any state only a finite number of times. If p < 1/2, similarly 
X; —> —œ with probability one. 

1 
If p=, then a, = 1, and 2k ~ —__. Tn this case 
P>=y% p Poo ae 
k 
Poo ~ Poo ~ = sa E 


and state 0 is recurrent. Thus, in the case of the symmetric random walk, the process 
starting from any state will return to the same sate again and again, infinitely often. 


The above classification and partitioning of the state space into classes is useful since, for 
many purposes, we can restrict our attention to one class. In particular, it is worth noting 
the following. 

If a process starts from a transient state, it can leave the class to which this state belongs. 
For example, in the case (4.5.1), the process starting from state 2 can move to state 0, and 
then it will never leave the class {0,1}. 

However, 


If the process starts from a recurrent state, it will never leave 
the class from which it has started to evolve. 


Indeed, let i be the initial state. Then the process can move to any state which is ac- 
cessible from i. If this new state had not communicated with i, it would have meant that 
the probability to come back to i is zero. Then the probability of returning to i would not 
have been one, and i would not have been recurrent. Consequently, any state to which the 
process can move from i communicates with i. Such a state, by definition, belongs to the 
same class as i. 

The same argument leads to the following fact: 


In a recurrent class, the process can reach 
any state from any state with probability one. 


(4.5.8) 


Indeed, if the probability to move from, say, state k to state i is less than one, then with 
a positive probability the process may move from i to k and not return to i, that is, the 
probability of returning to i, starting from i, would be less than one. 


4.5.3 Recurrence and travel times 


Let 7; be the number of steps required for the process, starting from i, to reach state k. 
Then the r.v. T; = Ty, is a return (or recurrence) time. Set my = E{Tix}, and mj = mij = 
E{T}. 


294 4. RANDOM PROCESSES AND THEIR APPLICATIONS I 


If state i is transient, the process will never come back to state i with the positive proba- 
bility 1 — f; defined above. So, with the mentioned positive probability, T; = œ and, hence, 
Mi =~. 

In general, if state i is recurrent, it also may happen that m; = œ. Such states are called 
null recurrent. Otherwise, the state is called positive recurrent. 


A classical example of null recurrence concerns states in the symmetric random walk 
from Example 4.5.2-1: for p = 5 all states, as was shown, are recurrent, but the expected 
return time m;; = œ. The direct proof is somewhat complicated; however, when we consider 
Markov moments in Section 5.2.4, we will be able to prove it almost instantly. 


The phenomenon mentioned is essentially connected with the fact that for the random 
walk, the number of states is infinite. For finite chains the situation is much simpler. 

Consider a chain with a finite number of states. Let i be a recurrent state, and S be the 
class containing i. As was shown above, starting from i, the process may travel only inside 
S. We can prove also the following simple proposition. 


Proposition 7 For the chain and state i defined above, mip < œ for any k € S. 


Proof is elementary, so we give its sketch. Since all states from § communicate, for 
each j,k € S, there exists m = m( j,k) such that pe > 0. Let M = maxjxesm(j,k) and 


& = min; (j,k) 
7 j.kES P jk 


0<a<M<o, 

Suppose the process starts from i. The probability that it will not enter state k during 
M steps is less or equal than 1 — a. In notation, P(T > M) < 1—«a. If the process has 
not entered k during the first M steps, the experiment is repeated, starting from the state at 
which the process has arrived at the Mth step. The probability that the process will not enter 
k during the next M steps is again less than 1 — æ. Consequently, P(T; > 2M) < (1 — a)’. 

It is clear now that P(Tj, > sM) < (1—«)* for any integer s. Note that (1 — a) < 1 because 
a > 0. Now, by using formula (0.2.2.3), we can write that E{Tiz} < Z oM -P(Tix >sM) < 
MY 9(1—a)* < œ. W 


. Since we consider max and min over a finite set of positive numbers, 


The values that recurrence times 7; can take are also connected with the periodicity prop- 
erty. Assume that p% = 0 for all n not divisible by some number k. It means that, starting 
from i, the process can revisit i only at moments k,2k, ... . We say that d is the period of 
state i if d is the largest number among all integers k with the property above. For example, 
if the chain may revisit the state i with positive probabilities only at moments 3,6,9,... , 
then d = 3. If d = 1, the state is called aperiodic. Clearly, if pj; > 0, state i is aperiodic. 

It may be shown, by making use of (4.5.7), that periodicity is a class property, so for an 
irreducible chain all states are either aperiodic or periodic with the same period. The chain 

f 01 
with P = | 10 
random walk. 


is obviously periodic with a period of two. The same is true for the simple 


4.5.4 Recurrence and ergodicity 


We are able now to state a basic ergodicity theorem. 


4. Markov Chains 295 


Theorem 8 For all states i, j of any irreducible recurrent aperiodic chain, there exists 


1 
ine = T; = —. 
coe 0) mj 


Analyzing this theorem, we can jump to the following conclusions. 
© limo pi = 0 iff mj = œ, that is, only when the state j is null recurrent. 


e Let mj =. Setting i = j, we get that lim;_,.. Pi; ; = 0. On the other hand, all states 
in the chain under consideration communicate. Hence, in view of (4.5.7), if Pi; a 0, 
then pi; — 0 for other states i, which means that m; = © for all states i. In other 
words, null recurrence is a class property, and for the chain under discussion (a) 
either all limits lim,_,.. p} j are equal to zero, or all are positive; (b) either all expected 
recurrence (return) times m; are infinite, or all are finite. 


e Recall that for any chain )/; pi gol. If the chain is finite, and k is the number of states, 
we can consider lim;_,.. of both sides writing 1 = lim;_,.. Xi p = Eia liM; pi J= 
Xi Tj, and having eventually ri 17%; = 1. (For an infinite chain we cannot bring 
lim;— inside the sum without additional conditions.) But 7 ;’s are either all zeros, or 
all positive. Since the sum equals one, the former is impossible. Thus, 


For any finite recurrent chain, 1; > 0, and mj < œ for all j. 


Now we state a general ergodicity theorem. 
Theorem 9 For any positive recurrent aperiodic chain and all states i, j 


5 f 
e aren S 


my 
m=, (4.5.9) 
and the limiting vector n = (11,12, ... ) is a unique solution to the equation 
n=nP, (4.5.10) 
satisfying (4.5.9). 
We considered a number of examples for finite chains in Section 4.4. Next, we consider 
one example for an infinite chain. 


EXAMPLE 1 (success runs). Assume that the process may move from state i=0,1,2,... 
either to state i+ 1 with probability p, or to state 0 with probability q = 1 — p. In other 
words, we either move up by one step, or fall to the bottom and start from the state 0. 
Assume, for example, that we are dealing with repeated independent trials, and p is the 
probability of success at each trial. (For instance, for an investor each day is either lucky or 


296 4. RANDOM PROCESSES AND THEIR APPLICATIONS I 


not with probabilities p and q, respectively.) We are interested in the number of consecutive 
successes, or in other words, we consider the length of the success run. At each step, the 
length mentioned may either increase by one or drop to zero. 

The transition matrix in this case is 


qgp00--- 
g0 pO: 


P= q00 p- 


Hence, (4.5.10) implies 
aÈ oT) = To, Ply = T1 for all k = 1,2,... . 
Since Yj-0%j = 1, we have m = q, and eventually x; = qp’ for i= 0,1,2,... Thus, the 


limiting distribution is geometric. In Exercise 56, the reader is invited to solve a problem 
where the probability of success depends on the current state. 


5 EXERCISES 


Section 1 


1. Consider a counting process for which interarrival times are independent and uniform on 
(0, 1]. Show that in this case (a) increments of the process are dependent; (b) the process is 
not Markov. Does your answer to the second question answer the first? 


2. Let the independent exponential interarrival times in the scheme of Section 1.2.1 be not 
identically distributed. Give an argument on whether the counting process N; still have in- 
dependent increments. Compare it with the case of identically distributed interarrival times. 
(Advice: Assume, for example, that the expected value of the kth interarrival time equals k, 
and observe that at any time you know how many arrivals have already occurred. Does this 
information matter for understanding how long we will wait for the next arrival?) 


3. Provide an Excel worksheet with realizations of Brownian motion. Play a bit with it, consid- 
ering different 5’s and different numbers of points chosen. 


P ; : F 
4. Show that lw, = 0 as t > œ, In other words, w; is growing slower than t, or w; = o(t) in 
probability. 


5. What distribution does X; from Example 1.3-1 have? Compute E{X;}, and Var{X;}. 


6. Provide an Excel worksheet with realizations of a discrete-time Markov process X; such that 
the distribution of X;.; given X, is uniform on [0,X;]. 


Section 2 


7. Verify (2.1.3). 


5. Exercises 297 


8.* Prove (2.1.7) proceeding from the following outline. For i.i.d. t’s, set m = E{t;} 40. For 


e>0, 
N, 1 1 
p(t = >e) =P(n,>+(< +2) | < P(N, > m) 
t m m 


where n; is the integer part of 
1 
(+ +e). 
m 


By (2.1.1), 
t 
P(N, > m) = P(r <1)=P( “mz +m), 
Nt Nt 
Note that 
t em 
m 
ny 1+em 


as t — co. Hence, 


2 
P apa <P Tn m< EM 
ni ni nt 2(1+em) 
for large t. Now note that T, is the sum of i.i.d. r.v.’s, and the last probability converges to 
Zero, as t — œ, by the LLN: 


Thus, 
for any € > 0. The probability 


is considered similarly. 


9. Customers arrive at a service facility according to a Poisson process with an average rate of 
5 per hour. Find 


(a) the probabilities that (i) during 6 hours no customers will arrive, (ii) at most twenty 
five customers will arrive; 


(b) the probabilities that the waiting time between the third and the fourth customers will 
be (i) greater than 30 min., (ii) equal to 30 min., (iii) greater than or equal to 30 min.; 


(c) the probability that after the first customer has arrived, the waiting time for the fifth 
customer will be greater than an hour; 


(d) for the same waiting time—its expected value and the standard deviation. 
10. Answer all questions from Exercise 9 under the condition that the mean interarrival time is 
30 min. 
11. Fora particular flow of claims, the mean number of claims per day is 10. 
(a) Assume that the third claim has just arrived. What is the expected waiting time for the 
fifth claim? 


(b) What is the probability that the third claim will come in less than three hours? 


298 


12. 


13. 


14. 


15. 


16. 


17. 


18. 


19. 


4. RANDOM PROCESSES AND THEIR APPLICATIONS I 


(c) Assume that during two hours after the last claim, no claims arrived. Find the proba- 
bility that during the next half an hour there will be no claims. 


Let N, be a Poisson process with rate A, and let T; be the time of the ith arrival. Write 


(a) Corr{N2z,N4—N>}; 
(b) E{N,Ni +s}, Corr{N;,Ni+s}; 


(©) E{Thim|N; =n}, Var{Trim|N; =n}. (Hint: you will avoid calculations if you think 
about the memoryless property.) 


Give an example of an intensity A(t) for which the probability that no claim will ever arrive 
is 1/e. 


Let A(t) be changing linearly. For an interval A = [t,t2], we define the average intensity as 
A = 4[A(t1) +A(t2)]. Show that M4 is a Poisson r.v. with the parameter À|A|, where |A] is the 
length of A. 


For a non-homogeneous Poisson flow of claims, the intensity A(t) during the first 8 hours is 
increasing as 10(t/8)? [ending up with 10 claims/hour at the end of the period]. Find the 
expected value and variance of the number of claims during the whole period. Given that the 
fifth claim arrived at 5 h, find the probability that the sixth claim will arrive after 5h 6min. 


A flow of arrivals N, is a non-homogeneous Poisson process with the periodical intensity 
X(t) = | sinat|. The unit of time is a day. 


(a) What is the intensity of arrivals at the end of each day? What about the beginning of 
each day? 


(b) When is the intensity the largest? 
(c) What is the mean and the variance of the number of arrivals during a year? 


(d) Estimate without any calculations P(|N,; — E{N,}| > ,/Var{N,}) for t = 1 year. 


Let the intensity of a non-homogeneous Poisson process A(t) = 1 for t € [0,1], and A(t) = 
100 for t € [1,2]. Explain heuristically and rigorously that the interarrival times Tı and t2 
are dependent. (Advice: Consider not a general representation but rather the conditional 
distribution of T2 given Tı = s for two particular values of s; for example for s = 1/2 and 
s=1.) 


Let the flow of accidents corresponding to an auto insurance portfolio be well modeled by a 
Poisson process with an intensity A(t). Each accident may cause zero, one, or two injuries 
with probabilities po, pı and 1 — po — pi, respectively. (The probability of a larger number 
of injuries is considered negligible.) 


Can the number of injuries by time t be modeled by a Poisson process? Can it be represented 
as a compound Poisson process? Can it be represented as a linear combination of independent 
Poisson processes? 


Assume that the occurrence of traffic accidents corresponding to a risk portfolio is well de- 
scribed by a Poisson process with rate A = 30 per day. The probability that a separate accident 
causes serious injuries is p = 0.1. The outcomes of different accidents are independent. Es- 
timate the probability that during a month the number of accidents with serious injuries will 
exceed 100. 


20. 


21. 


22. 


23:5 


24.* 


25. 


26. 


27. 


28. 


5. Exercises 299 


Ann is receiving telephone calls in the claim department of an insurance company. Calls 
come at a Poisson rate of 1 each 15 min. Consider a time interval and the probability that 
there will be no more than one call during this interval. What length should the interval have 
for this probability to be greater than 0.8? 


Solve Exercise 20 for the case when the intensity is decreasing as A(t) = 1/(1 +t), and the 
initial time t = 0. 


The intensity of a Poisson process is decreasing as A(t) = 1/(1 +t). Compare the intensity 
and the probability that there will be no arrivals during the period [0,7]. 


Section 3 


(a) Write formula (3.2) for the case where X’s are i.i.d. and have a log-normal distribution, 
and the intensity A = 1. 


(b) Write formula (3.3) for the case where X’s are i.i.d. and have a log-normal distribution, 
and the intensity A(t) = 2r. 


For a risk portfolio, let us consider the probability that the total aggregate claim over the time 
period [0, 10] will exceed the total premium paid. We need to estimate the loading coefficient 
O such that the probability mentioned will be less than, say, 0.05. In which cases below would 
you apply approximation (3.3): (a) A(t) = 17; (b) A(t) = 1/(1 +1); (c) A(t) = 1/(1 +t); @) 
A(t) = 100/(1 +£)? 


For a flow of claims, the counting process N; is the same as in Exercise 16, the size of 
each claim does not depend on previous claims and has the Pareto distribution in the form 
(2.1.1.17) with a = 4. Denote by Sj) the aggregate claim during the period [0,1]. For t = 
l year 


(a) Find E{Sj)} and Var{S,) }. 


(b) Estimate without calculations P (So > E{Siy} t4 [Var{S)}) ; 
(c) Estimate the loading coefficient 8 for which P (Sy) > (1 +0)E {S }) < 0.05. 


A flow of claims arriving at an insurance company is represented by a Poisson process in 
continuous time. The mean time between two adjacent claims is half a day. The random 
value X of a particular claim is uniformly distributed on [0,10] (say, the unit of money is 
$1000), the relative loading coefficient © = 0.05. Find the mean and the variance of the 
aggregate claim during a year. Estimate the probability that the aggregate claim at the end of 
a year will not exceed the premium paid during the same year. 


For a particular group of clients, a flow of claims arriving at an insurance company may 
be represented as a compound Poisson process. The mean number of claims the company 
receives per day is 10, the amount of a particular claim is equal to either 2, 3, or 4 with 
probabilities 1/4, 1/2, and 1/4, respectively. Estimate the monthly premium the company 
should charge in order that during each separate month the probability of making profit is not 
less than 0.8. 


The aggregate amount of claims arriving at an insurance company may be represented as a 
compound Poisson process. The mean number of claims the company receives per day equals 
three, the amount of a particular claim is uniformly distributed on [0,1]. When charging 
premiums, the company proceeds from the relative loading coefficient 0 = 0.1. 


(a) Find the expected value and the variance of the aggregate claim the company receives 
during 100 days. What premium does the company get during this period? 


300 


29. 


30. 


31. 


32. 


33. 


34. 


35. 


36. 
37. 


38. 


4. RANDOM PROCESSES AND THEIR APPLICATIONS I 


(b) Find an approximate value of the initial surplus u of the company for which the ag- 
gregate claim at the end of the period mentioned will exceed the total premium to be 
received plus the surplus u, with the probability less than 0.05. (Advice: In order not 
to repeat numerical calculation, carry them out at the end of the problem.) 


John sells T-shirts at the beach. There are two types of shirts: priced $10 and $15. People 
stop at the stand according to a Poisson process with rate 4 = 10 per hour. However, each 
customer buys a shirt with probability p = 0.3, and if she/he buys, a cheaper shirt is bought 
with probability 0.75. Nobody buys two shirts. 


(a) Does the number of people who bought shirts have a Poisson distribution? If no, give 
an example, if yes, find the intensity of the corresponding Poisson process. 


(b) Does the number of people who bought shirts for $10 have a Poisson distribution? 
(c) Are the numbers of people who bought shirts for $10 and for $15 independent r.v.? 


(d) Find the probability that the first shirt will be purchased during the first hour, and before 
this purchase two customers will stop at the stand but will not buy anything. (Advice: 
This means that there will be at least three arrivals during the first hour but the first 
two arrivals will be unlucky for John. Look up the joint distribution of N4, ..., N; given 
N = n in Section 3.3.2.1.) 


(e) Using normal approximation, estimate the probability that the total sales of shirts dur- 
ing 8 hours is greater than $250. 


Section 4* 


Explain why in Example 4.1-1, the probability to move from the state “rainy today, and rainy 
yesterday” to the state “rainy today, and normal yesterday” is zero. 


Would you assume that h, in Example 4.1-2 is monotone in s? 


Compute the probabilities to move from each state to each state in two steps for Examples 
4.1-1,2. Regarding Example 2, find the probability to survive two years being of age x in two 
ways: using multiplication of matrices and just proceeding from common sense. 


Compute the probability of the path 0 — 0 — 1 for Examples 4.1-1,2, and the probability of 
the path 0 > 1 — 2 > 1 — 0 for Example 4. 1-4. 


Provide an Excel worksheet for Example 4.1-7 and play a bit, changing the transition ma- 
trix and watching what happens. Consider several realizations, generating different random 
numbers. 


Compute E {Z3} and Var{Z3} in the situation of Example 4.2.1-3. Do the same for the case 
when r.v.’s Y,; have exponential distributions, and E{Y,;} = i. 


Compute, using Excel or another software, E {Ss} for the situation from Example 4.2.1-4. 


Does the discount factor depend on the size of the cash flow in the models we consider here? 
Is it always the case in reality? 


Mr. M. runs a business. Each year, with a probability of 0.1, Mr. M. cancels this business. If 
this does not happen, he faces either “bad” or “good” year with respective average incomes 
1 or 2. The transition probabilities for these two states are specified by the matrix 


’ 


0.6 0.4 
P- |0307 


39. 


40. 


41. 


42. 


43. 
44, 


45. 


46. 


47. 


48. 


5. Exercises 301 


where zero state corresponds to a “bad” year. The initial year is “good”. Evaluate the ex- 
pected discounted present value of the total income for the business under consideration in 
the long run for v = 0.9. 


Using Excel or another software, provide solutions for different values of v in the situation 
of Example 4.2.2-1. Graph the estimate for E{S,,} against v and explain the tendency from 
an economic and mathematical point of view. 


Using software, compute the expected present value of total payments during 3 periods for 
the data from Example 4.2.2-1. 


Assume that, in the situation of Example 4.2.1-2, we distinguish three health conditions: 
‘healthy’, ‘sick’, and ‘died’, with 
0.85 0.14 0.01 


P =||0.6 0.35 0.05 
0 0 1 


(The example is designed merely for illustration, but it contains an attempt to reflect a pos- 
sible situation on the average. Usually, for young people the numbers in the first column are 
larger than in the table above, and for old people—smaller. To some extent, it may make the 
homogeneity assumption less restrictive.) 

The mean annual health care cost in the two first states—‘healthy’ and ‘sick’—equals 1 and 
4, respectively. Assume that at the initial moment, 94% of the clients of the company are 
healthy. Find the actuarial present value of the total cost in the long run with a discount of 
0.9. Find the expected absorption time, i.e., the expected lifetime. 


For the insurance in Example 4.2.4-1, 


(a) write the matrix C; 

(b) verify the answer obtained in this example; 

(c) under the assumption that a chosen person will die within a year with probability one, 
write without any calculations what S, should be, and show that formula (4.2.15) does 
not contradict your answer. 


Derive (4.2.11) and (4.2.13) from (4.3.5). 


In the situation of Example 4.2.3-1, using the first-step-analysis approach, find the expected 
time of being healthy (the number of moments of being in state 0) for an insured who was 
healthy at the beginning, and for the insured chosen at random. 


Show that the transition matrices in (4.4.5) do not satisfy Doeblin’s condition, while the 
matrix in Example 4.4-4 does. Show that in the last case we do not even have to compute a 
power of P. 

Consider Example 4.4-1 for k books (or cards). Show that in the long run all k! permutations 
asymptotically are equally likely. 

Does Doeblin’s condition of Theorem 4 hold for the transition matrix (4.4.13)? Connect it 
with what was said in Example 4.4-3. 


Does Doeblin’s condition of Theorem 4 hold for the transition matrix 


0.40.60 0 
0.70.30 0 
P= 0 0 050.5 
0 0 0.10.9 


Argue why in this case we should not hope for ergodicity. 


302 


49. 


50. 


51. 


52.0% 


53.** 


54, 


ah es 


56.** 


4. RANDOM PROCESSES AND THEIR APPLICATIONS I 


Assume that in the situation of Example 4.4-2, in the stationary regime, on a day, the number 
of claims occurs to be six. Find the probability that the weather on this day corresponds to 
the icy road condition. 


Peter runs a small business classifying each day as good, or moderate, or bad. Transitions 
correspond to a Markov chain with transition matrix (4.1.1). A daily income is a random 
variable having the Pareto distribution in the form (2.1.1.18) with O = 4 and parameter 0 = 
4, 5, and 6, respectively, depending on the type of the day. Find 


(a) the expected daily income in the long run; 


(b) the probability that, in the stationary regime, the income on a particular day will exceed 
two units of money. 


In the situation of Example 4.2.2-1, find the stationary distribution for the transition matrix 
(4.2.8). Discuss the result in terms of “to what extent the investment climate is good, what 
proportion of the years is good”. 


For a short period, we can assume that the probabilities in (4.1.2) do not depend on time. 
What classes do such a chain have? 


(a) Show that, if there are more than one state, and p; = 1 for some i, then the chain is 
reducible. 


(b) Give an example of a reducible chain for which all transition probabilities are less than 
one. 


(c) Show that in order to specify the classes of a chain, we do not need to know particular 
values of transition probabilities but only which of them are not equal to zero. Why 
does it not contradict the statement of Exercise 53a? 


Classify the states of 
00 050.5 
10 0 0 
P= 00.750 0.25 
01 0 0 
Show that, while a chain with P = | i ; is clearly periodic with a period of 2, a chain with 
0 1/2 1/2 
P=|11 0 o lis aperiodic. 
1/3 1/3 1/3 


In the problem from Example 4.5.4-1, assume that from state i the process moves to state 
i+ 1 with probability p; = th and to state 0 with probability qi = 1 — pi, where a number 
A<1,andi=0,1,2,... 


(a) Proceeding from (4.5.10), find the limiting distribution 1. 


(b) Let Tọ be the return time to state O (not counting the initial stay at 0). Find the distri- 
bution of Tọ and its mean value. 


Chapter 5 


Random Processes and their Applications. 
II: Brownian Motion and Martingales 


We continue to consider different types of random processes keeping the notation from 
Chapter 4. 


1 BROWNIAN MOTION AND ITS GENERALIZATIONS 


In this section, we revisit Brownian motion, or the Wiener process, w; defined in Section 
4.1.2.2. 


1.1 More on properties of the standard Brownian motion 
1.1.1 Non-differentiability of trajectories 


The definition of w; in Section 4.1.2.2 requires the trajectories of the process to be con- 
tinuous. Let us turn to differentiability. 

As before, let wa stand for the increment of w; and ®(x) denote the standard normal d.f. 

To determine whether w; has a derivative at a point t, consider a time interval A = (t,t +ô] 
and explore the behavior of the r.v. Na = wa /ò as 6 > 0. 

By definition, the r.v. wa is normal with zero mean and a standard deviation of V8. Then 
for x > 0, we have P(|na| > x) = P(|wal > x8) = 2(1 — &(x8/V8)) = 2(1 — B(xV8)) > 
2(1— (0)) = 1 as 6 0. Since the last relation is true for an arbitrary large x, this 
means that when 4 is approaching zero, the r.v. na] takes on arbitrary large values with a 
probability close to one. Rigorously, na| — œ% as 6 > 0 in probability (for a definition of 
this type of convergence see Section 0.5). 

As a matter of fact, an even stronger property is true. Namely, with probability one, 
trajectories of Brownian motion (that is, w; as a function of t) are nowhere differentiable; 
i.e., the derivative does not exist at any point. (See, e.g., [112, p.32] or the outline of a proof 
in [70, p.268].) 

This is an amazing property. Trajectories are continuous but not smooth, and the process 
fluctuates infinitely frequently in any arbitrary small time interval. However, this is not an 
obstacle for applications. If we are interested in the increments of the process over intervals 
that perhaps are small but not infinitesimally small, then we are dealing with r.v.’s wa which 
are normal in the mathematical and usual sense as well, and hence are tractable. 


When in 1872, K. Weierstrass constructed a function that was continuous but non-differentiable 


303 


304 5. RANDOM PROCESSES AND THEIR APPLICATIONS II 


FIGURE 1. 


at any point, it was a significant mathematical achievement. Some people considered this func- 
tion pathological, others—a mathematical masterpiece, but regardless, this function looked exotic. 
Nowadays, the Wiener process whose trajectories are functions with the same property, serves as a 
good model for many applied problems. 


1.1.2 Brownian motion as an approximation. The Donsker—Prokhorov 
invariance principle 


Let &),&2,... be i.i.d. r.v.’s having zero means and unit variances. Let Sọ = 0 and S = 
Git. +&. 

Let us consider the time interval [0,1] and for each n = 1,2,..., construct a piecewise 
linear (or polygonal) random process x”) on (0, 1] as follows. 
We divide [0,1] into n intervals A; = (41, 4], k = 1,...,n, of the same length +. The 
end points of these intervals are the points tk = tk, = k/n. At the points tg, we set (see also 
Fig.1) 


and for tk-1 < t < tk, we define x” as a linear function whose graph connects points 
(t-1,X.”) and (te, X); see again Fig. 1. 

The process so constructed is called a partial sum process. We may view it as the se- 
quence of partial sums $1, ...,5,, compressed in a way that it runs in the interval (0, 1]. Since 
&,’s are independent, the process x” is that with independent increments. We will see that 
for large n, the fluctuations of the piecewise linear process x” are approaching those of 
Brownian motion. 

Proposition 1 For any t, 


x” 4 w as n>, (1.1.1) 


d ESENE 
where the convergence > means the convergence of the distributions of the correspond- 
ing r.v. s. (see also Section 0.5). 


1. Brownian Motion 305 


We prove it at the end of this section, but first note that, as a matter of fact, an essentially 
stronger assertion is true. Namely, not only the marginal distributions (for separate t’s) 
of the process x” converge to the corresponding marginal distributions of w;, but the 
probability distribution of the process x” as a whole (that is, the joint distribution of 
the values of the process at different time moments t) converges to the distribution of the 
standard Wiener process. This fact is referred to as the Donsker—Prokhorov invariance 
principle. A rigorous statement may be found, for example, in [47], [70]. By virtue of this 
principle, the Wiener process may be viewed as a continuous approximation of the partial 
sum process. 


EXAMPLE 1. Let €; take on values +1 with equal probabilities. Then the process of 
partial sums corresponds to the symmetric random walk considered in Section 4.4.3.2.2. 
We have shown there that, starting from a level u, the symmetric random walk will reach 
a level a before hitting zero level with the probability p, = u/a. The corresponding ruin 
probability is qu = (a — u) /a. Consider the process X; = u + wr. This is Brownian motion 
starting from level u. In view of the invariance principle, we may conjecture that the cor- 
responding ruin probability for X, will be the same as for the symmetric random walk. In 
Section 2.4.4, we will show that this is indeed true. 


Proof of Proposition 1. For a fixed t € (0, 1], let k = k(n) be the smallest integer which 
is not less than tn. Formally, k = tn if tn is an integer, and k(n) = [tn] + 1 otherwise. (As 
usual, [a] denotes the integer part of a.) Because t > 0, we have k(n) — œ% as n > ©, 

For k so defined, t € Az. Indeed, if tn is u meem then t = k € Ax. If tn is not an integer, 
we have tn < k <tn+1, which implies £= Li <t <£ 7 


Since the €’s have zero means and unit variances, E {Si(n) } = 0 and Var{ Sin) } = k(n). 
(n) ; 


Because X; ” is a linear on [t,_1,t], 


xO aye teat Se a) t-t  &e 


wd ta yn ar te—t1 yn 


(see also Fig. 1). By construction, Hee) — t as n — o, By the CLT, 


teat Se 
th—th-1 vyn 


d : : ; S y 
Hence, x” £ VtZ. The rv. /t Z is normal with zero mean and variance f, that is, it has 
the same distribution as w,. E 


where Z is a standard normal r.v. Also, 


xl a 
u 
Vice 


1.1.3 The distribution of w;, hitting times, and the maximum value of 
Brownian motion 


First, note that since by definition w; is normal with zero mean and variance t, we can 
explicitly write its density and the d.f. In accordance with (0.3.2.16), the density of w; is 


1 x? 
fit) = Fae {-F, (1.1.2) 


306 5. RANDOM PROCESSES AND THEIR APPLICATIONS II 


and the d.f. 
(1.1.3) 


for all t > 0. 

The next two definitions are illustrated in Fig.2. We set Ta = min{t > 0; w, = a}, the 
time at which the process, starting from zero, first reaches (or hits) a level a. (In Chapter 4, 
the symbol qT stood for an interarrival time, but it should not cause confusion here as we do 
not consider interarrival times in the current chapter.) 

Let w; = max Ws, the maximal value of the process over the interval [0,t]. (Since trajec- 

S 


tories w; are continuous with probability one, the maximum exists with probability one.) 
We explore two characteristics, Ta and w;, together because they are strongly connected. 
Namely, 
ta <t iff w, >a. (1.1.4) 


Indeed, if the maximum value of the process over the period [0,t] was greater than or equal 
to a, then the process “had to cross” the level a. Since the process is continuous, it could 
not overshoot a and was equal to a at the moment of crossing. 

By the formula for total probability, 


P(w; > a) = P(w: > a|Ta > t)P(ta > t) +P(w: > a|Ta < t)P(ta <t). (1.1.5) 


The first conditional probability clearly equals zero because if the continuous process w; 
reached the a-level at the first time after time f, the process cannot be larger than a at time t. 
The second conditional probability 


P(w; > a|tq <t) =1/2. (1.1.6) 


To show this, one may reason as follows. The random moment T, is less than or equal to f. 
The value of the process at the moment T, is exactly equal to a. Hence, if t > Ta, the r.v. 


Wi =A F Wir, 


where w(q,,„] is the increment of the process in the remained time interval (Ta,t]; see Fig.2. 
Then, w; will be greater than or equal to a only if wiz, 4) = 0. In view of symmetry, w/z, 1 is 
equally likely to be positive or negative, which implies (1.1.6). 


1. Brownian Motion 307 
> More precisely, 


P(w; > alta < t) = P(a + Wirt] 2 a|Ta St) = PWG, 1 Z O| Ta < t) = P(W(z,, 1] 2 0). 
(1.1.7) 
The last step is true because w, is the process with independent increments: once w; has hit 
the level a, the future evolution of w; does not depend on the hitting time. 

The r.v. wiz, | is not normal, since the length of (Tq, a] is random, but it is a symmetric 
r.v. Indeed, given Ta = some u, the r.v. W(z,, +] = W(uz]- This r.v. is normal and, in particular, 
symmetric for any u. Hence, w(z,, +] is symmetric and, in particular, it is equally likely to be 
positive or negative. So, P(w(z,, 1 20) = 5. From this and (1.1.7), we get (1.1.6). < 

Thus, (1.1.5) implies that P(w; > a) = 5P(Ta < t). Combining it with (1.1.3), we even- 
tually obtain that 

P(tq <t) =2(1—®(a/Vt)). (1.1.8) 


Then, in view of (1.1.4), 
P(w, <a) =2®(a/Vt) —-1. (1.1.9) 


In (1.1.8)-(1.1.9), we consider the distribution functions of two rv.’s: Ta and w;. In 
(1.1.8), the argument of the d.f. is t, and a is a parameter, while in (1.1.9), the roles of these 
two quantities switch: a is the argument of the d.f., and t is a parameter. 


EXAMPLE 1. Assume that you own a stock whose current price per share is Sọ = 100. 
The price changes in time as the process S; = Soexp{ow,}. In this context, the parameter 
© is called a volatility. In Example 4.1.3-1, we already mentioned that such a model may 
be sufficiently adequate in many situations; a more general model will be considered in 
Sections 1.3 and 2.4.6. 

Let © equal, say, 0.15. You have decided to sell your shares when the price increases by 
10%. What is the probability that this will not happen within the first year? 

You are going to sell your stock at the first time t when S, > 1.1So. This is equivalent to 
the inequality exp{ow,} > 1.1, or w; > tIn(1.1) ~ 0.63. (Note that the answer does not 
depend on Sp.) So, you will not sell your shares if w, does not reach the level 0.63 during 
the time period [0,1]. The probability of this event is P(w; < 0.63) (why?) and equals 
2@(0.63) — 1 ~ 0.47. Note that w; is a continuous r.v., so it does not matter whether to 
write P(w; < x) or P(w; < x). 


Formula (1.1.9) leads to an unexpected, at first glance, conclusion. Consider the prob- 
ability that during a fixed time interval [0,7] the process will take on only non-positive 
values. (In this case, the graph of the realization will be under the t-axis.) This probability 
is P(wr <0). In accordance with (1.1.9), it is equal to zero for any arbitrary small T > 0. 
This means that starting from zero, the process cannot move down taking for a while only 
negative values. On the contrary, during any arbitrary small interval [0,7], the process 
will cross zero level with probability one. It may be proved that before “leaving zero”, the 
process fluctuates around zero infinitely often, rapidly oscillating around zero, so to speak. 
Since the state of the process at any time may be viewed as the initial state with respect 
to the future evolution, the same conclusion concerns the behavior of the process around 
any point. The evolution of the process is by no means smooth but consists of an infinite 
number of small but frequent movements up and down. 


308 5. RANDOM PROCESSES AND THEIR APPLICATIONS II 


1.2 The Brownian motion with drift 


In Chapter 0, we defined the normal r.v. with mean m and variance o? asarv.m+oxX, 
where the r.v. X is standard normal. 
We define a Brownian motion with drift parameter u and variance parameter 67 as the 
process 
X; = ut +0w;, t > 0, (1.2.1) 


where w; is the (standard) Brownian motion. 

Since w; is a process with independent increments, so is X;, and for any time interval A, 
the increment Xq is normal with mean y|A| and variance o7|A|, where |A| stands for the 
length of A. In particular, the r.v. X; is (ut,o°t)-normal, and hence for any t > 0, the density 
of X; is 


2 1 (x — ut)? 
fix) = ae exp zo% ) ; (1.2.2) 


1.2.1 Modeling of the surplus process. What a Brownian motion with 
drift approximates in this case 


For a risk portfolio, let c; = ct be the premium collected by time tf. Here, c is the rate at 
which the premium is accumulating. Denote by S(p the total claim paid during the period 
[0,t]. For simplicity, we will not consider the initial surplus. Thus, the surplus at time ¢ is 
R == So: 

In some situations, a Brownian motion with drift, ut + ow,, may be a good model for the 
process R;. It is worth emphasizing, however, that in such a model, u is the expected value 
of the profit per unit of time, that is, the premium minus the mean total amount of claims 
per unit of time. 

Roughly speaking, we can adopt such a model in situations when the process “looks 
almost continuous”, and during any small period of time, the profit is small and proportional 
to the length of the period on the average. When u = 0, we can think about the limiting 
model in the framework of the invariance principle from Section 1.1.2. A simple example 
of an approximation with u Æ 0 is given in Section 4.4.3.2.3. 

The Brownian motion approximation may work also well in the case when Sọ) is a com- 
pound process, but here we should be cautious in interpretations. 

For example, assume that S(,) a compound Poisson process (see Section 4.3). More 
precisely, let 


N; 
Si) = L i, 


where N, is a homogeneous Poisson process with rate A, and &; is the size of the ith claim. 
(Here we chose the symbol € instead of X since in this context X, stands for the random 
process.) 

Let m = E{&)}, and x* = E{&?}. In our case, E{S()} = mat and Var{S(,)} = °M 
(see Section 4.3). In order to approximate R, = ct — Sq) by X; = ut + Ow;, we should set 
E{X,} = E{R,} and Var{X, } = Var{R,}. This amounts to ut = ct — màt and 07t = z? M, or 


u=c—mA, 07 =x). (1.2.3) 


1. Brownian Motion 309 


Because w; is a continuous process, we can hope for a good approximation only if each 
particular claim is “small” but the number of claims during a unit period, i.e., A is “large”. 
Assume that this is the case, and to indicate that the €;’s are small, let us represent them by 
&; = 6Y;, where the rescaling parameter 6 is viewed as small, while Y’s are “not small” r.v.’s 
and do not depend on 6. In such a setup, 


= ÒS where S(r) -5 Y;. 


Set m = E{Y;}, and % = E{Y?}. By Theorem 3.11, for a fixed ¢ and large A’s, the 
normalized r.v. St j= = (Se) —mAr)/V%M is asymptotically normal, which justifies the ap- 
proximation of R, by ie noreal distributed r.v. X;. 


Route 2 = page 311 


Let us consider it, however, in more detail. First, note that m = 6m and x* = 8?%. For a 
particular finite u and o, solving (1.2.3) with respect to À and c, we have 


1 0? 1 mo? 
r z2 E =u+ 532 
Then 
1 mo? 
E{Sq} = mM = 5” Var{S i) } —o’t. 


Since 6 is small, the parameter A is large, which has been expected. However, we see also 
that E{S(,)} and the premium are large, so the expected profit E{R;} = ut is neither large 
nor small only because the large claim is balanced by the large premium. Note also that in 
this case, Var{ Sq } is independent of 6. 

This looks somewhat artificial but we should realize that we are talking about a mathe- 
matical approximation. 


In any case, it is worth emphasizing that the model based on a compound process is not 
the only model for a surplus process. In practice, the aggregate claim during even a small 
time period may come from a large number of independent clients and may be closely 
approximated by a normal r.v. by virtue of the CLT. This circumstance itself provides hope 
for a good accuracy of the Brownian motion approximation. 


The phenomenon of the large expectations we discussed above is not surprising from a point of 
view of the general theory of processes with independent increments. As has been already noted in 
Section 4.1.2.2, any such process may be represented as a Lévy processes; that is, a certain com- 
bination of Brownian motion and Poisson processes. These two types of processes are essentially 
different. The former represents a continuous component of the process, while the latter describes 
possible “jumps” (as in counting processes). From the theory mentioned it follows that no combi- 
nation, even infinite, of Poisson processes may lead to a Brownian motion without elimination of 
an infinite drift. For the corresponding theory, see, for example, [45], [46], [70]. 


310 5. RANDOM PROCESSES AND THEIR APPLICATIONS II 
1.2.2 A reduction to the standard Brownian motion 


In this section, we provide a technical formula useful in many applications. All expecta- 
tions appearing below are assumed to be finite. 


First, consider a normal r.v. X with mean m and—to avoid cumbersome formulas—with 
unit variance. We are interested in E{g(X)} for a function g(-). Denote by Eo{g(X)} the 
expectation in the case when X is standard normal and observe that 


E{g(X)} = Eo {g(X) exp{mX -m° /2}}. (1.2.4) 
Indeed, 
X 1l 2 i 2 1 2 
E{g(X)} = Doo ai /2dx = [ swen eee ldx 


= | alse" Po(x)ds, 


where @(x) is the standard normal density. This leads to (1.2.4). 
Next, we write a counterpart of this formula for the Brownian motion with drift. 


Let X, = ut + w; (for simplicity, we set o = 1). We use the notation X’ for the (random) 
function X,, 0 < u < t, the whole trajectory until time t. Let g(X’) be a function of such a 
trajectory. Thus, g(X*) may depend on the whole trajectory. For example, g(X*) may equal 
maxo<y<; Xu- 


Denote by Eo{g(X‘)} the expectation in the case u = 0, i.e., the case of the standard 
Brownian motion. 


Proposition 2 For any t > 0, 
E{8(X')} = Eo {9(X") exp{uX, — tyr /2}}. (1.2.5) 


The point here is that the exponent above involves only X;, the value of the process at 
the last moment. Proposition 2 is the simplest version of Girsanov’s theorem widely used 
in Financial Mathematics. See, e.g., [70]; statements for the continuous and discrete time 
cases, as well as detailed comments, may be found in [130]. 


Proof. We will consider the case when g(X‘) depends on the values of the process 
at two points: t and some s < t. The case of an arbitrary number of points is considered 
similarly. In the case where g(X‘) depends on values at all points s € [0,t], one should 
apply the limiting argument by using the fact that the trajectory X* is continuous. 


So, let g(X’) = g(X;,X;) for a function g(-,-). The r.v.’s Xs and X; — X, are independent, 
have means us and u(t — s), and variances s and t — s, respectively. Then 


1. Brownian Motion 311 


E{g(X')} = E{g(Xs,X)} = E{8(Xs,Xs +X, — Xs) } 
Se ee | 9) Fen (x— us)? /2s} 
1 


aa (y— ult —s))°/(2(¢—s)) }dxdy 


= ff” sx+yjexptuae+») = sit} 


«een 2/3} ex y?/(2(t—s)) }axdy 
= Eo fex +X; —X;) exp{u(X; +X; — X;) = sey 


= Eo { e(X,X expla- 51°}. m 


The reader familiar with the notion of a Radon-Nicodim derivative realizes that we have com- 
puted the derivative of the distribution of the Brownian motion with drift with respect to that of the 
standard Brownian motion. 


1.3 Geometric Brownian motion 


Let us consider a process Y,=Yoexp{X;}, where Yo > 0 is a certain number, and X, = 
ut+ow,;, a Brownian motion with drift. The process Y, is called a geometric Brownian 
motion. Since Xo = 0, the number Yo is the initial value of the process Y,. Because In(Y,) = 
In(Yo) +X;, and X, is normally distributed, Y, has a log-normal distribution. 

The geometric Brownian motion is widely used for modeling investment processes, for 
example, the evolution of stock prices. To clarify why the future value of an asset, for 
instance, a future stock price, may be closely approximated by a log-normal r.v., let us 
consider the following simple model. 

Let So be the initial price of an asset. The price at the next moment of time—say, on the 
next day—may be written as Sı = SoR1, where R, is the growth factor during the elapsed 
period, or the return for this period per unit of money. If R; > 1, the price has grown; if 
R, < 1 the price has dropped. At the next moment, the price S2 = S1R2 = SoR R2, where R2 
is the return in the second period. Continuing in the same fashion, we get that at the end of 
the nth time period, the value of the asset is the r.v. Sn = SoR1-...-Ry, where R; is the return 
corresponding to the ith time period. Then InS, = InSp+InR; +...+1nR,, which is the 
sum of r.v.’s. Consequently, under mild conditions, we can use the CLT and approximate 
the distribution of In S,, by a normal distribution. 

The approach we will use below is based on the infinitesimal argument and shows how to 
treat the problem in continuous time. Let Bo be an investment into a risk-free security with 
a constant interest rate r, and let B, be the result of investment at time t. In this case (see 
also Section 0.8.1), the relative growth over an infinitesimally small interval [r,t + dt] is 
“ = rdt, (1.3.1) 


t 


312 5. RANDOM PROCESSES AND THEIR APPLICATIONS II 


which leads to the solution B, = Boe”. 
Now, let Sp be an investment into a risky asset and S; be the corresponding result at time 


t. By analogy, assume that 
dS, 
ee = mdt+odw,, (1.3.2) 
t 
where m,© are parameters, and dw; is the infinitesimal increment of the standard Brownian 
motion over the infinitesimal interval |t,t + dt]. The difference between (1.3.1) and (1.3.2) 
is that we have added the random component odw; in (1.3.2). 
Since the length of the interval [t,t + dt] is dt, the variance of dw, is equal to dt. So, for 
the r.v. dS, f Sı, we have 


E { dS; / Si} = mdt, Var { dS; / Si} = 0°dt. 


It is natural to call m the expected return (per unit of time). The quantity © is called a 
volatility and is considered a measure of riskiness in this framework. 

Solving (1.3.2) is not as easy as solving (1.3.1) since we cannot integrate (1.3.2) directly— 
as we know, w; is not differentiable. The corresponding theory leads to the following solu- 
tion: 

S: = Soexp { (m— 0° /2)t +0ow;}, (1.3.3) 


that is, to the geometric Brownian motion with u = m — 0° /2. 

Derivations of (1.3.3) at different levels of rigor may be found in almost any textbook on 
Financial Mathematics (see, e.g., [29], [62], [130], [135], [145]). Al derivations are based 
on the famous differentiation /to’s formula obtained first by K. Ito. We omit the proof but 
will use (1.3.3) later as an example. 


2 MARTINGALES 


In this section, we assume all r.v.’s under consideration to have finite expectations. 


2.1 Two formulas of a general nature 


Throughout this section, we systematically use the notion of conditional expectation 
E{Y |X} introduced in Section 0.7.1 and clarified there and in subsequent chapters. In 
what follows below, the symbol X in E{Y |X} may stand for a random vector as well as for 
a random variable. 

Below, we will repeatedly use the formula for total expectation (see, e.g., (0.7.2.1)) 


E{E{Y |X}} =E{Y}. (2.1.1) 


In addition to (2.1.1), we will need two more simple relations. In the first reading, the 
reader may take these relations at a heuristic level. 


2. Martingales 313 


Consider two r.v.’s or r.vec.’s: X and X. 


If X = g(X) where g(-) is a one-to-one function, then E{Y |X} = E{Y |X}. 


(2.1.2) 


To show this, let us recall how we defined the r.v. E{Y |X}. First, we have defined the 
regression function m(x) = E{Y |X = x}, and after that, we set E{Y |X} = m(X). So, 
when considering X, we define the regression function m(x) = E{Y |X = x}, and we set 
E{Y |X} = M(X). 

To make it explicit, let X, X be rv.’s, and g(x) =x. The general case is considered 
absolutely similarly, and the reader is invited to do it on her/his own. So, let X =X}. We 
have 

M(x) = E{Y |X =x} = E{Y |X? =x} = E{Y |X =x'9} = m(x!) 


Then, 
E{Y |X} =m(X) =m(X"/3) = m(X) = E{Y |X}. 


If g(-) is not a one-to-one function, then different values of X may correspond to the 
same value of X, and (2.1.2) may be not true. However, we can proceed as follows. 

First, let us look again at (2.1.1). In the 1.-h.s., we first compute the expected value 
of Y given additional information (about the value of X), and after that, we compute the 
expected value of the conditional expectation. Such a procedure leads to the unconditional 
expectation in the r.-h.s. 

Since conditional expectations inherit the main properties of “usual” expectations, we 
can write a counterpart of (2.1.1) for conditional expectations. In particular, we can replace 
the expectation E{-} in (2.1.1) by the conditional expectation E{-|X}. The only thing 
we need for such a generalization to be true is that the information on which the interior 
conditional expectation is based should be either more detailed than, or at least equal to, 
the information based on values of X. 

If X = g(X), this requirement is fulfilled because given X, we know exactly which value 
X has assumed (but perhaps not vice versa). Thus, 


If X = g(X) where g(-) is a function, then E{Y |X} = E{E{Y |X}|X}. (2.1.3) 


(We first condition Yon X, and after that—on X .) 

If g(-) is a one-to-one function, then (2.1.3) is trivial: in view of (2.1.2), the 1.-h.s. in 
(2.1.3) is equal to E{Y |X}, and the r.-h.s. equals E{E{Y |X }|X} = E{Y |X}. 

We proceed to random processes. 


2.2 Martingales: General properties and examples 


Beginning to build a general framework, we presuppose the existence of an original basic 
process &; on which all other processes under consideration depend. We may interpret &; as 


314 5. RANDOM PROCESSES AND THEIR APPLICATIONS II 


a global characteristic of the “state of nature” at time t. In general, €, may take on values 
from an arbitrary space; for example, &; may be a vector process. 

We will use the notation € for the whole trajectory &s, 0 < s < t through time t, and 
sometimes call it the history of the process by time t. So, the conditional expectation of a 
r.v. given Ẹ is the conditional expectation given the entire history through time f. In the 
case of discrete time, €' is just a sequence €0,€1,...,G;. The exposition below is designed 
in a way that the notation & will not cause confusion with the power symbol. 

For all other processes X; to be considered, we assume that for each t, the r.v. X; is 
completely determined by values of &’. In other words, X, is a function of €’. We say that 
X; is adapted to &. 

Let, for example, each €, take on real values and time ż is discrete. Let Xp = €0, Xı = 
£061, X2 = 06162, and so on: X; = €q-...-&. Clearly, the process X, satisfies the above 
condition. 

Now, note that since given &! the value of X; is known, 


BAX; |G} = Xp. (2.2.1) 
The process X; is called a martingale with respect to the basic process €, if for all t,s > 0, 
E{Xi45 | ag = X;. (2.2.2) 


When considering martingales, it is convenient to use the game or investment interpreta- 
tion and view X, as the total profit (perhaps negative) of a player or an investor by time t. 
In this case, definition (2.2.2) means that if t is the present time, then on the average, the 
future profit X;,,; is equal to what the player has already reached at time t. 

Since the basic process is fixed, we will often omit the reference to &, just calling X, a 
martingale. However, it is important to emphasize that if (2.2.2) holds, the process X; is a 
martingale with respect to itself also, that is, for all t,s > 0, 


EEX eX" | = X,, (2.2.3) 


where X‘ = (Xo,...,X;), the history of the process by time t. We use the same symbolism 
as for €' and will do the same for other processes below. 

To justify (2.2.3), we recall that X’ is a function of &. Hence, by general rule (2.1.3), 
E {Key |X} = ELE {Xr |E} |X'} = E{X,|X'} =X. 

Therefore, if we work with a process X; having property (2.2.3), we can take as a basic 
process the process X; itself. By convention, we will do it each time when in a particular 
problem the basic process is not specified. However, in general, it is reasonable to suppose 
that there is one basic process on which all other processes under consideration depend. 

In the discrete time case, together with the process X;, it is also convenient to consider 
the process of the increments of X;. More precisely, set Zo = 0, and Z,,; = X;+1 — X;, the 
profit in one step (play) after time t = 0, 1,... . Then, in view of (2.2.1) and (2.2.2), 


E{Z |S} = E{X41 -X |S} = E{X41 |S} -EX |S} =X -X =0. 


In the game interpretation, this means that regardless of the past, the profit in one step 
equals zero on the average. We call such a game fair. 


2. Martingales 315 


R.v.’s Z, for which 
E{Z,11|6'} =0 for all t =0,1.,... (2.2.4) 


are called martingale differences with respect to €,. Since all processes under consideration, 
including Z,, depend on €,, we will again omit the reference to &;. 
Like martingales, martingales-differences are martingale differences with respect to them- 
selves too, that is, 
E{Z41|Z'} =0 for allt =0,1,.... (2.2.5) 


The proof is similar to the proof of (2.2.3). 
Note that from (2.2.4) it follows, in particular, that 


E{Z,;} =0 for allt =0,1,.... (2.2.6) 


(Since, by the general property (2.1.1), E{Z,41} = E{E {Z1 |6 }} = E{0} =0.) 
Now, if Z, = X;41 —X;, then 


X: = Xo + (Xj — Xo) + (X2 X1) Ferg (X; X1) = Xo +Z +... +Z. (2.2.7) 


The interpretation is clear: the total profit is equal to the initial capital plus the sum of all 
profits in the previous plays. 

Thus, a martingale may be represented as the sum of martingale differences plus the 
initial value. The converse assertion is also true. Let X, be equal to the very right-hand side 
of (2.2.7) where Z,’s are martingale differences. Then 


E{Xi41|6} = E{X +Z |S} = E{X EO} +E {ZE} =X +0 =X. 


Thus, we have proved (2.2.2) for s = 1. In the discrete time case, this implies (2.2.2) for 
any natural s. Indeed, again in view of the property of conditional expectations (2.1.3), 


E{Xras|X"} = ELE {Xis |X} |X} = BX 1X} = = EX |X} =X 


The reader is also recommended to solve Exercise 19. 
Thus, in the discrete time case, 


A random sequence X; is a martingale if and only if it may be represented 
by (2.2.7), where {Z,} is a sequence of martingale differences. 


We now proceed to examples. The first two are trivial and the third is also very simple, 
but they shed some light on the nature of martingales. 


EXAMPLE 1. Let time t be discrete and the process S, = €; +... +r, where &’s 
are independent and E{&;} = 0 for all i. Then, in view of the independence condition, 
E{6i1 |6} = Ef{Ei41} =0. So, &;’s are martingale differences and, consequently, S; is a 
martingale. We see that the notion of martingale in discrete time is a generalization of the 
sum of independent variables with zero means. In (2.2.7), we do not require the Z’s to be 
independent, but rather to have zero conditional expectations E{Zj,,|Z'}. 


316 5. RANDOM PROCESSES AND THEIR APPLICATIONS II 
Let in our example E{&;} =m Æ 0. Then S; is not a martingale. However, 
the process X; = $; — mt is a martingale. (2.2.8) 


Indeed, S, — mt = Y5_,(€; —m), and E{§; —m} = 0. This simple observation will lead to 
some non-trivial conclusions in Section 2.4.2. 


A generalization of (2.2.8) looks as follows. 


EXAMPLE 2. Let &, be a process with independent increments in continuous or discrete 
time. Let m, = E{&,} and X; = €; — m. We will show that X, is a martingale. 

Since E{X;} = 0 for any t > 0, we have E{X¢ is} = E{Xr+s} — E{X:} =0. Then 
E{Xs |} = E{X; +X rey |6} =X +E{Xg +s] |G}. Since the increment Xg +s} does 
not depend on €, we have E{X¢ rs] |G} =E {Xq +1} = 0, and hence E{X;+5 |G" } = X- 

For instance, if N; is a homogeneous Poisson process with parameter A, then X; = N; — At 
is a martingale. A further generalization of the procedure above will be considered in the 
end of this subsection. 


EXAMPLE 3. Let time be discrete, and let X, = C; -1 -...-&;, where &’s are i.i.d. positive 
r.v.’s, and C; is a constant. What value of C, will make X, a martingale? 
The answer is simple. We should set C; = m~', where m = E{&;}. Indeed, in this case, 


E{Xr45 | E} = E{m 1 rae Erts | Ey = m CELE] ao & Zane) : ase -Erts | E} 
=m mi. GE ba e brs lE} =m XE bra o Eras} 
= m°X;m° — X. 


EXAMPLE 4. Let time be discrete, and let &1,62,... be i.i.d. rv.’s with zero means. 
Consider a bilinear form of €’s, more precisely, set 


xX; = Ł Ent = 2 oye 


1<i<j<t 


Then X; is a martingale. To prove this, we will show that Z,,; = X;+1 — X; is a martingale 
difference. We have Z,41 = X$] 6iG+41 = 641 Lj_) & Hence, 


E{Z41|5} = (zs) E{mi |E} = (zs) E{G+1} =0. 


EXAMPLE 5. Let S;, t = 0,1, or 2, be a stock price at three consecutive moments of 
time. All possible outcomes are shown in Fig.3a. The initial price is 10. At the end of the 
first period, the price may be either 14 or 8. If at time ¢ = 1 the price occurs to be 14, its next 
value may be either 15 or 11, while if Sı takes on a value of 8, at the end of the second time 
step the price may be either 10 or 5. Such models of price evolution are called binomial 
tree models. Find a probability measure P under which the process $; is a martingale. In 
Financial Mathematics such a measure is also called a risk neutral measure. It is widely 
used in pricing various financial products in financial markets. 

Let p = P(S2 = 15|S; = 14), and p’ = P(S2 = 10| S1 = 8) (these are probabilities to 
“move up” starting from 14 and 8, respectively; see also Fig.3a). In order to have E{S2|S,}= 


2. Martingales 317 


15 3/4 15 
14 <= 14 = 
11 1/3 1444 
10 10 
10 3/5 10 
5 2/5 5 
(a) (b) 

FIGURE 3. 


Sı, we need 15-p+11-(1—p) = 14, and 10- p’+5-(1—p’) = 8, which gives p = 3/4, 
and p' = 3/5; see Fig.3b. 

For E{S; | So} = So, we should have 14- p” +8- (1— p”) = 10, where p” = P(S; = 14), 
which gives p” = 1/3. The probabilities p, p’, p” completely specify the probabilities of all 
four possible paths in the tree: the probability of the path 10 + 14 — 15 is 4- å = 4; for 


the path 10 > 14 > 11, it is 7 : i = the probabilities for the two paths remained are 
23 2 


3°55 


ie 
and Z : 2 = +. See also Fig.3b. 

EXAMPLE 6 is noteworthy, deep, and concerns both discrete and continuous time. Let 
E, be any process, and V be a r.v. which may be viewed as a global characteristic of the 
process €. For instance, V = maxo<;< & if such a maximum is finite, or V is the first 
moment when €, will reach a particular level. Let X, = E{V |&'}, the conditional expected 
value of V given the information about the evolution of the process €, until time rt. In this 
setup, the process X; is a martingale. 

To prove it, it suffices to apply again property (2.1.3) from which it follows that 


E{X;45|5'} = E{E{V |S} 1S} = ELV |Ẹ} =X. 


The idea of the next example is similar to the idea of Example 3. Since this example is 
used in a number of problems below, we present it as a proposition. 


Proposition 3 Let a geometric Brownian motion Y, = Yo exp{ut + ow; } where a positive 
Yo is certain, and let © =+07/2. Then the process Z, = e “Y, is a martingale with 
respect to wy. 


Note that since there is a one-to-one correspondence between the processes Y,, Z;, and wz, 
it does not matter whether to condition on Y’, or Z’, or w‘, and hence Z; is also a martingale 
with respect Y, and itself. We will see that it is more convenient to condition on wz. 

Furthermore, Z, = Yo exp{—ut — (07t/2) + ut + ow;} = Yoexp{—o7t/2 + ow;}; that is, 
the ut terms cancel out. We use this fact in the proof below, but for future references, it is 
convenient to have Proposition 3 as is stated above. 


318 5. RANDOM PROCESSES AND THEIR APPLICATIONS II 
Proof. Since w; is a process with independent increments, 


E{Z,4|w'} =Yoexp{—0(t +)/2}-Efexp{owrs}|»"} 
= Yoexp{—0?(t +5)/2}-Efexp{olw; twa rns)} Iw} 
= Yoexp{—0?(t +5)/2} -exp{ow,}: Efexp{owg rss} |W} 
= Yexp{—0o°t /2 + ow;}-exp{—o’s/2}- Efexp{ow(rr+s}} 
= Z,-exp{—o’s/2} -Efexp{ow ts} (2.2.9) 


The last expectation is equal to the value of the moment generating function of the r.v. 
Wr,r+s] at point ©. The r.v. wọ +s] is normal with zero mean and variance s. Hence [see 
(0.4.3.6)], E{exp{0W¢ +s] }} =exp{0°s/2}, and (2.2.9) implies the assertion of the propo- 
sition. W 


EXAMPLE 7. Consider the stock price process S, from (1.3.3). Assume that the risk 
free interest in the market is r and is compounded continuously. In this situation, from the 
standpoint of time 0, the present value of the stock price at time t is W, = e7” S,; see Section 
0.8.3. For the process S+, the role of a from Proposition 3 is played by (m — 507) + 50° =m. 
Consequently, for the process W; to be a martingale, we should set m = r. 

In summary, for W, to be a martingale, the expected return m should be equal to the 
risk free interest r. Such a situation is often referred to as a “risk neutral world” for the 
following reason. 

Let m = r. Since W, is a martingale, 


E{W.+5|W'} =W,. (2.2.10) 


Suppose that the present time is the initial time t = 0, and we are speculating about possible 
values of the future price. When comparing the possible prices at two different future 
moments of time, t and t + s, we should not compare the prices themselves (S, and S;+s) but 
their present values from the standpoint of the initial time (W; and W;+s). Relation (2.2.10) 
means that, on the average, the present value keeps the level it has already reached. In other 
words, whatever value W, assumes, given this value, the conditional expected value of W,+s 
will be equal to W,. 


In conclusion, we establish one simple but important property of martingales. 
Proposition 4 If X, is a martingale, then for any t > 0, 
E{X,} = E{Xo}. (2.2.11) 
(For a martingale, expected values do not change in time.) 


Proof. Setting t = 0 in (2.2.3) and taking into account that X? = Xo, we have E{X, | Xo} = 
Xo. Computing the expected values of both sides, by virtue of the basic property (2.1.1), 
we get that E{X,} = E{Xo}. It remains to replace s by t. W 


EXAMPLE 8. Let us revisit Example 7. By Proposition 4, from (2.2.10) it follows that 
E{W,} = E{Wo}. On the other hand, Wo = So, the initial price, and Wo is not random. 


2. Martingales 319 


Eventually, E{W,} = So. Thus, in the situation of Example 7, the prices themselves change 
even on the average, but the mean present values of the future prices do not change. 


Next, we generalize the procedure from Examples 1-2 and consider how to make any 
process a martingale. Let €, be an arbitrary process with finite expectations in discrete 
time t = 0,1,2,..., and Vj = &)+...+6&,. Note that any process V; may be represented 
in this way: it suffices to set &ọ = Vo, and & = V; — V;_1, the increment over the period 
(t—1,t]. Let us set A; = E{&6; |6 !}, and B; = A1 +... + A;. Then the process M, = V, — B; 
is a martingale. 

Indeed, we can write M, = €)+Y4_, Zr, where Z; =; —A;. On the other hand, E{Z, |611} 
= E{E, |E 1} — {ELE [EF JE} = B{E, [EF — ELE, [EF = 0; that is, Z;’s are 
martingale differences. 

In Example 2, to make a process with independent increments a martingale, we sub- 
tracted the corresponding expectations. We see that in general, we can do the same if we 
subtract conditional expectations. The process B, below is called a compensator, and the 
representation V, = M, + B; is called Doob’s decomposition. 


2.3 Martingale transform 


Let X;,t =0,1,2,..., be a martingale with respect to a basic process {€,}, and let Z,.; = 
X41 — X, t =1,2,..., the corresponding sequence of martingale differences. Then X; = 
Xo +Z +... + Z. 

Consider another sequence of r.v.’s {¥,} such that for each f, the r.v. Y,}ı is a function 
of &’. Such a sequence is called predictable. To clarify the significance of this definition, 
assume that f is the present time and €’, that is, the evolution of the basic process until 
time t, is known. From the standpoint of time t, the future value of X;+ı is random, still 
unknown, while the value of Y,,,, being a function of €, is known or “predictable”. (For 
example, the sequence A;+; = E{&,+1|&'} in the end of the previous section depends only 
on &', and hence, is predictable.) 

The process 


W,=XotYZ, +... + YZ; (2.3.1) 
is called a martingale transform. 


EXAMPLE 1. A player participates in a sequence of independent plays (turns). At each 
turn, success and failure are equally likely. Let a r.v. &, take the value +1 if the turn t 
is successful, and —1 otherwise. So, &; = +1 with equal probabilities. Let Xọ = 0 and 
X; = E1 +... + Ër, the difference between the number of successful plays and the number 
of non-successful plays by time t. The process X, is a martingale and the corresponding 
martingale differences Z; = €;. 

Suppose that after the play t is over, the player makes a bet of Y;,; at the next play t + 1. 
The player is free to choose any Y,;,; depending on known results, so Y,,; depends on 
OA 

For example, assume that the player bets $1 if she/he lost the last game, and the player 
increases the stake by $1 if she/he won. In this case, Y;4; = 1 if & = —1, and Y,41 = Y, +1 
if& =1. 


320 5. RANDOM PROCESSES AND THEIR APPLICATIONS II 


As another example, assume that the player determines at what moment she/he will quit 
playing. Suppose that the rule of quitting is such that “to play or not to play” after a time 
t is completely determined by the previous history of plays, that is, by &. In this case, the 
moment of quitting is a r.v. T, and the event {T < t} is completely determined by &’. We may 
describe such a situation setting Y, = 0 for +>t. (To quit means to bet zero.) In particular, 
whether Y,1 = 0 (the player quits after time t) depends on €. 

Let us return to the general case of arbitrary predictable stakes Y,. Clearly, the total profit 
is given by (2.3.1) with Z; = &. 

Generations of gamblers tried to find a sequence Y, (a betting strategy) which would 
transform games from fair (or non-favorable) into favorable—that is, into games for which 
E{Wi|5'} > W.. 


Proposition 5 The transform W, in (2.3.1) is a martingale. 


Proof. Since Y,,, depends only on €', we have E{¥,41Z,41|6'} =¥,41:E{Z41|6} =0 
because Z;,1 is a martingale difference. Hence, the sequence Y,Z, is that of martingale 
differences and, consequently, W, is a martingale. W 


2.4 Optional stopping time and some applications 
2.4.1 Definitions and examples 


We begin with the discrete time case. Consider a process €,, t = 0,1,... , and an integer 
valued r.v. tT such that 


The event {t<r} is completely determined 
by the values of the rv.’s &,...,&). 


(2.4.1) 


The r.v. t with property (2.4.1) is called a Markov time, or an optional time. Sometimes 
instead of “time” we will say “moment” meaning a time moment. 

If in a particular problem, we deal with one process, say, X;, then we can choose as a 
basic process the process X; itself, and in this case, in definition (2.4.1), we may replace €’s 
by X’s. 

The r.v. T is interpreted as the moment of time when a certain event connected with the 
evolution of the process €, occurs. Condition (2.4.1) means that, if t is the present time 
and if we know the history of the process until time t, then we do know whether the event 
mentioned occurred before or at time t. 

A typical example is a hitting time, for example, the time when a process &, first reaches 
a level a. In this case, tT = min{t : &; > a}. 

Another good example is the time when, starting from zero, &; returns to zero. 

As one more example, suppose that the €’s are integer valued r.v.’s, and T is the moment 
when the process takes on an odd value at the first time. Then T is an optional time. 

A typical example where a r.v. T is not an optional time is the moment when the process 
attains its maximum over a certain period. More precisely, for a fixed T, consider the r.v. 


t=min{t <T: &= max §s}. (2.4.2) 


2. Martingales 321 


In order to determine whether a moment f is a point of maximum, we should know the 
future values of the process after the time t. Consequently, the event {T =r} is not generally 
determined by €. 

Let, for example, €; be a stock price. Then T in (2.4.2) is the time when the price attains 
its maximum. If t were an optional moment, the stockholders would have known when to 
sell their shares to maximize profit. Clearly, this is not the case. 


Generalizing definition (2.4.1), we also call t an optional time in the following situations. 


e The r.v. t does not depend on &’s at all; we could say that t depends on the €’s in a 
trivial way. 


Condition (2.4.1) holds for each finite ¢ but with a positive probability no event {t<7r} 
occurs. In this case, we say that T = œ. 


For example, we know that for the random walk with p > 1/2 (see Section 4.4.3.2), 
the probability that starting from a level u, the process will hit zero (the ruin prob- 
ability) is qu = [(1 — p)/p]" < 1. So, if t is the moment of ruin, then P(t = œ) = 
1—q, > 0. 


We will call an optional time t for which P(t < œ) = 1 a stopping time. 

In Probability Theory, a r.v. X for which P(X < œ) < 1, is called improper or defective. 
Thus, an optional but not stopping time is a defective r.v. 

Before turning to results, note also that definition (2.4.1) will not change if we replace 
{T < t} by {t =t}, which is typical in the literature when only the discrete time case is 
considered. 

Indeed, the event {T < t} = {t = 1}U...U {T =t}. Fork < t, if event {T =k} is de- 
termined by the values of &*, then it is determined by the values of Ẹ'. Then the union 
Ut_ {T = k} is also determined by &. Vice versa, {t = t} = {t < t} — {T <t — 1}, and 
both events are determined by &. 

The continuous time case is somewhat more complicated because for each time moment 
t, we should take into account the behavior of the process in an infinitesimal neighborhood 
of t. Without going too deeply into the theory, we just state that in this case, it is reasonable 
to define an optional time T as ar.v. such that 


The event {t < t } is completely determined by the values of €, 
i.e., the trajectory €, for all s € [0,7]. 


(2.4.3) 


The fact that the point ¢ is not included into the event {t < t} allows us to avoid some 
technical difficulties. The difference between (2.4.1) and (2.4.3) is not essential because 
usually for a “good process” in continuous time, the probability that a certain event will 
happen exactly at a fixed time t is zero. For example, for the Poisson process, the probability 
that a new arrival will occur exactly at a fixed time fo, is zero (say why). 

All other definitions above are the same as in the discrete time case. 


We come now to a very useful fact which allows us to solve many problems concerning 
global characteristics of processes in an explicit way. Cases in point are ruin probabilities, 


322 5. RANDOM PROCESSES AND THEIR APPLICATIONS II 


the moments of reaching certain levels, etc. The significance of corresponding theorems 
consists in the fact that the martingale property is preserved by optional stopping at certain 
random times. 

In this section, we restrict ourselves to a specific property, namely, (2.2.11). We will see 
that under mild conditions, (2.2.11) continues to be true if we replace the certain time t by a 
random stopping time t. Under such a replacement, the assertion of Proposition 4 becomes 
much stronger and deeper. 

Thus, our next step is to establish conditions under which a martingale X, and a stopping 
time T satisfy 


The martingale stopping property: E{X,} = E{Xo}. (2.4.4) 


In Section 2.5, we consider a more general version of this property. 
First, note that (2.4.4) is not always true. 


EXAMPLE 1 (the doubling strategy) is classical. A player plays a game of chance 
consisting in a sequence of independent bets with probability p > 0 to win at each bet. 
Having started with a stake of one, the player plays until the first win doubling her/his bet 
after each loss. After the first win, the player quits. In casinos, such a strategy is also called 
a “negative progression”. 

In the scheme of Example 2.3-1, this corresponds to the bet Y; = 1, and ¥,.; = 2,41 
fort =1,2,..., where 41 = 1 if E1 =... =€, = —1 (there were only losses and the player 
keeps playing) and J;,; = 0 otherwise (the player won before or at time ¢ and has quitted). 
Then the profit at time 0 is Wo = 0, and the profit at time t > 0 is W; = Y161 +... + Y;&r. 
Note that so far, €; = +1 with probabilities p and 1 — p, respectively, rather than with equal 
probabilities. 

If the first success happens at time k + 1, the player’s profit will be 2 — (1+2+4+4+4 
... +21) = 1. Let t be the moment of the first win, that is, the number of plays to be 
played. The r.v. t has a geometric distribution. More precisely, P(t = k) = p(1 — p)*!, 
E{t}=1/p, and P(t < œ) = 1 if p > 0. Hence, T is a stopping time, that is, the probability 
that a success will never happen is zero. This means that the doubling strategy allows a 
player to get one unit of money with probability one, i.e., without any risk to lose money. It 
is worth noting, however, that this presupposes that the player should, at least theoretically, 
have an infinite initial capital. In Exercise 26, we discuss this from a somewhat more 
realistic point of view, but now it is important for us to consider the case p = 1/2. 

The sequence J; is predictable, and the process W, is a particular case of the process 
(2.3.1). Since p = 1/2, we have E{&;} = 0 for all i’s, and by Proposition 5, the process W, 
is a martingale. 

Now, the profit at the moment T is equal to one, that is, W; = 1. On the other hand, 
Wo = 0, which means that (2.4.4) does not hold. 


Modern dictionaries (see, e.g., [143]) give three meanings of the word ‘martingale’. The first 
concerns a strap of a horse’s harness keeping the horse from rearing its head; the second—a device 
for keeping a sail in a certain position; the third—a system of betting in which, after a losing wager, 
the amount bet is doubled. Probably, the use of the word in the third definition came by analogy 


2. Martingales 323 


with the first. Non-mathematical dictionaries do not give the fourth, and nowadays widespread 
mathematical meaning of the word. To the author’s knowledge, in the mathematical sense and by 
analogy with the gambling case, the term martingale was first used by J. Ville in “Etude Critique 
de la Notion de Collectif (1939)”!. Later, J. L. Doob’s book [35] made the martingale an important 
chapter of Probability Theory. The first use of martingales in Actuarial Modeling is due to H. Gerber 
and F. DeVylder; see [41], [42], [34]. 


Next, we establish some conditions under which (2.4.4) is true. 


Theorem 6 Let X, be a martingale and T be a stopping time. Then the martingale 
stopping property (2.4.4) holds if at least one of the following conditions is true. 


1. There exists a constant c such that the r.v. t < c with probability one. 


2. There exists a constant C such that the r.v. |X,| < C for all t < t with probability 
one. 


3. Time is discrete, E{t} < œ, and there exists a constant C such that the conditional 
expectation E{|Z,+4| |G’ } < C for all t =0,1,... with probability one, where the 
martingale differences Z,4, = X;41 — Xr- 


We consider a generalization of this theorem and a proof in Section 2.5. 

To comment on the conditions above, we interpret the stopping time as the time at which 
the process stops to run. Condition 1 means that the process will stop before or at a finite 
time c. Condition 2 means that before or at the stopping time qt, the process itself did not 
exceed a finite value C. Condition 3 concerns the discrete time case and means that the 
conditional expected increments of the process are bounded. 


EXAMPLE 2. For illustration, we show that none of these three conditions are satisfied 
in the situation of Example 1. Indeed, in this case t has the geometric distribution and 
assumes any positive integer value with a positive probability. So, t is not bounded. For 
t < T, the profit W; = —(1 +... +217!) = —2'+1 and, hence, is not bounded. The same 
concerns the profit in one play. In our example, the role of the martingale differences Z, is 
played by the r.v.’s ¥,€,. If t+1 <1, then Y41641 = —2’, and the rv. E{|¥,41&41| |E" } 
assumes the value 2’. Hence, the rv. E{|¥,41&+1| |6" } is not bounded. 


Now, we turn to examples where Theorem 6 proves to be quite useful. 
2.4.2 Wald’s identity 


Proposition 7 Let €),€,... be i.i.d. rvs, and let m = E{&;} and be finite. Let t be a 
stopping time with respect to the process & and such that E{t} < œ. Set Sz = X} 5i. 
Then 

E{S,} =mE{t}. (2.4.5) 


It is recommended that the reader looks up Proposition 3.1 where—in a slightly different 
notation— (2.4.5) was proved in the case when T does not depend on the €’s. The result 


'The author thanks Professor Michael Sharpe for this reference. 


324 5. RANDOM PROCESSES AND THEIR APPLICATIONS II 


(2.4.5) is much stronger since now the number of terms in the sum depends on the values 
of the terms. 

Proof. Let Xo = 0, X; = S, — mt , where S, = Ei &;. In Example 2.2-1, we showed that X, 
is a martingale. The corresponding martingale differences are Z,,, = €,,; — mM, and since 
the Ẹ’s are independent, E{|Z,41| |Z! } = E{l&1 — ml} < E{lë1l} + Im] = E{lġ11} + 
|m|. The last step is proper because the &’s are identically distributed. So, Condition 3 of 
Theorem 6 holds, and we can write 


0 = E{Xo} = E{X,} = E{S, — mt} = E{S,} —mE{t}, 


which implies (2.4.5). E 


EXAMPLE 1. Consider the random walk as described in Section 4.4.3.2. Let Xp = 0 
(the process starts from zero level); X; = 1 +... +67; 6; = +1 with probabilities p and 
q = l — p, respectively; p > 1/2; and Ta = min{t : X; > a}, the time of reaching a level 
a > Q at the first time. 

Since m = E{&,} = 2p — 1 > 0, we have E{X,} = t(2p — 1) > œ as t + ©. Then, by the 
law of large numbers, starting from 0, the process will cross the level a with probability 
one. Consequently, Ta is a stopping time. 

Formally, to find E{t,} with use of Wald’s identity (2.4.5), we should first prove that 
E{t,} < œ. We skip this preliminary step and turn directly to the value of E{t,}.” 

By (2.4.5)—the role of S+ is played by X;, we have E{X,, } =mE {ta} = (2p — 1)E {ta}. 
Assume that a is an integer. Since in each step X, increases or decreases by one, at time T 
the process will exactly take the value a (rather than overshoot the level a). In other words, 
Xz, =a. Then a = (2p—1)E {ta}, and 


a 
E{t%}==——. 2.4.6 
fab = 54 2.4.6) 
For example, if p = 2/3 and a = 1, then starting from zero, the process will reach the (next) 
level one in three steps on the average. See also Exercise 24. 


EXAMPLE 2. In the previous example, let p = 1/2 and hence m = 0. Assume that 
E{tq} < œ. Then by Wald’s identity, we would have a = 0 - E{ta} = 0, which contradicts 
the assumption a > 0. Consequently, E{t,} = œ, which is not trivial at all. 

However, this should be understood correctly. The above assertion does not mean that 
the process will be moving to a infinitely long; however, the mean time it will take is 
infinite. In its turn, this may be understood as follows. Suppose we run independent replicas 
of the same random walk; that is, we repeat the experiment many times. Let Tai be the 
value of the stopping time Ta in the ith replica. Then, by the LLN, with probability one, 
1 (Tat +++. + Tan) > % as n — ©, 


2The finiteness of E {Ta} may be proved in many ways, however it requires the knowledge of some additional 


facts; see, e.g., [122, p.368]. For the reader familiar with Mathematical Analysis, the easiest way could be 
to use Fatou’s lemma (see, e.g., [70], [129]) stating that for any sequence X, 2$ X, it is true that E{X} < 
liminf E{X,}. For an integer n, consider the stopping moment min{Ta,n} whose mean is finite. By (2.4.5), 
E{Smin{t,n}} = mE{min{ta,n}}, and since E{Smin{r, n} } < a, we have E{min{tg,n}} < a/m. Then, by 
Fatou’s lemma, E {ty} < liminfE{min{ty,n}} < a/m < ©. 


2. Martingales 325 


EXAMPLE 3. Set again p = 1/2 and consider the r.v. Tọ = min{t : X, = 0, t = 1,2,...}, 
the time needed to revisit 0 starting from 0. In Section 4.4.5.2, we have proved that P(t) < 
oo) = 1; that is, the random walk will revisit state 0 with probability one. (The reader who 
did not follow Route 3 may take this fact for granted.) Let us prove that though Tp is finite, 
E{to} = oo, 

(For the reader who read Section 4.4.5, note that in other words we are going to prove 
the null recurrence of states of the symmetric random walk, which was stated in Section 
4.4.5.3.) 

Let T be the time needed to reach 0 after the first step (regardless of whether the process 
moved up or down). Then To = 1 +7’, and it suffices to show that E{t’} = œ. 

In accordance with the notation Ta, let Tı be the time of reaching 1 starting from 0. Let 
T} be the time needed to reach 0 starting from 1, and T’; be the time of reaching 0 starting 
from —1. Since we consider a symmetric random walk, the r.v.’s T1, Tis and T , have the 
same distribution. Then P(t! = k) = 5P(t, =k) + 5P(t_, =k) = 5P(t1 =k) +5P(t1 = 
k) = P(t, =k). Thus, 7’ and 7; also have the same distribution. As has been proved, E{t,} 
is infinite for any a > 1. Consequently, E {t’} is also infinite. 

EXAMPLE 4. Consider the general random walk when, as in Example 1, X,=&)+...+6, 
but the €’s are arbitrary i.i.d. r.v.’s with a finite mean m > 0. Let T4, a > 0, be defined as 
above. In this case, we cannot write that X, = a since the process may overshoot the level 
a. But we can write X;, > a, which implies that mE{t,} > a and hence 


E{ta} >a/m. 


There is one case, however, when we can write a precise equality. Let the &’s be expo- 
nentially distributed. Then, due to the memoryless property, given that the process has 
exceeded the level a, the overshoot has the same exponential distribution with the same pa- 
rameter as the original &’s. (See Section 2.2.1.1.) Consequently, E{X;,} = a + (the mean 
value of the overshoot) = a +m. This implies 


a+m 
E{ta} = 


ZAS, 
m 


In Exercise 25, we prove that E{t,} = œ if m = 0. 


2.4.3 The ruin probability for the simple random walk 


Consider again the classical random walk when Xp = u, X, = u +Ẹ1 + ... + Ẹ for t = 
1,2,... , where the &’s are independent and take on values +1 with probabilities p and 
q = 1 — p, respectively. We assume that u is a natural number and 0 < u < a for some fixed 
natural number a > 0. See also Section 4.4.3.2 for further detail. We will see how quickly 
one can compute the ruin probability in this case by making use of Theorem 6. 

Let t = min{t : X, = Oor a}, the moment of reaching the boundaries of the corridor 
[0,a]. We set p, = P(X, = a), the probability that the process will first reach the level a, 
and qu = P(X, = 0), the probability that the process will first reach the level 0, i.e., the ruin 
probability. 

First, let p = 1/2. The process X; is simultaneously a martingale and a Markov chain. 
The reader who skipped Section 4.4.5.2 from Route 3 should either take the fact that in 


326 5. RANDOM PROCESSES AND THEIR APPLICATIONS II 


this case P(t < cc) = 1 for granted or look at the notion of recurrence and (4.4.5.8) in this 
section. Once we know that T is a stopping time, the rest is straightforward. 
Because 0 < X, < a for t < t, Condition 2 of Theorem 6 holds. Consequently, by Theo- 


rem 6, 
E{X,} =E{Xo} =u. 


On the other hand, E{X,} = ap, +0(1 — pu) = apu. Thus, py = u/a. 

Let p # 1/2. We saw in Example 2.4.2-1 that for p > 1/2 the process will reach level 
a with probability one. Hence, t is a stopping time. Since 0 is also a barrier, the same 
argument implies that q is a stopping time for p < 1/2. 

Let us set r = q/p and consider the process 


Y =r". 
The conditional expectation 
Ear [Ep = B(OH E} = r EÉ |X} = EUR} = ¥ (rp tr |g). 
By our choice of r, we have rp ++r~'g = q+ p = 1, and consequently 
E{Y |E} =Y. 


Thus, Y, is a martingale with respect to {€,}, and hence with respect to itself. 
For t < t, values of Y, lie between r? = 1 and r“. Hence, Condition 2 of Theorem 6 again 
holds. (Depending on whether r > 1 or not, r° > 1 or < 1.) Applying the theorem, we have 


E{Y;} = E{Yo} = r". 


On the other hand, E{Y,} = r'pu +1- (1— pu) = 1+ pu(r*—1). So, pu = (r* —1)/(r?-1), 
which coincides with (4.4.3.15). 


2.4.4 The ruin probability for the Brownian motion with drift 


Let us consider the same exit problem for Brownian motion. Set X, = u + ut +ow,;, where 
w; is the standard Brownian motion, © > 0, the initial point u € [0,a], and a > 0. Here u 
and a do not have to be integers. The basic process in our problem is w;, and Xo = u. Let, 
as in the previous section, T = min{t : X, = 0 or a}, py = P(X; = a), qu = P(X, = 0). 


Proposition 8 For u= 0, 


Pu =u/a. (2.4.7) 
For u# 0, 
l-e™ Qu 
Pa aa where Y= ST (2.4.8) 


In both cases, qu = 1 — Pu. 


In the case when there is no upper barrier, that is, when a = œ, the probability p, is the 
probability to never reach zero, and, accordingly, q, is the ruin probability. 


2. Martingales 327 


Corollary 9 Ifa — œ, then py — 0, qu > 1 for u < 0; and py > 1—e-™ for u >Q. 
Thus, for u > 0, and a =, the ruin probability 


qu = 1 — pu = €" = exp{—2uu/0°}. (2.4.9) 


The reader may check that the formulas above coincide with approximations (4.4.3.17) 
and (4.4.3.18) and interpret this fact proceeding from the invariance principle of Section 
1.1.2. 


Proof of Proposition 8. First, let u = 0. In this case, X; = u + ow,, and the probability 
that during a time interval [0,7] the process X; will never reach the level a equals 


a—u 
< = << == = = 
P(max X, <a) P(max w, < (a—u)/o) =2® (=) 1 


in accordance with (1.1.9). This probability converges to 26(0) — 1 = 0 as T — ©, which 
means that the process will reach the level a with probability one. Thus, T is a stopping 
time. 

The rest is very similar to what we did in the previous Section 2.4.3. The process X; 
is a martingale (see Example 2.2-2, and Exercise 17), and 0 < X; < a fort < t. Hence, 
Condition 2 of Theorem 6 holds and E{X,} = E{Xo} =u. On the other hand, E{X,} = 
apy +0(1 — pu) = apy, where p, = P(X, = a). This leads to p, = u/a. 

Let u#0. By the LLN, with probability one, X, —> œ if u > 0, and X; > — if u < 0. 
This means that with probability one the process will exit the “corridor” [0,a], and hence 
t is a stopping time. Another way to justify it is just to show that P(0 < X; < a) > 0 as 
t > œforu#0. 

Next, we will use a nice technique which we will apply in the next chapter systematically. 
Let W, = exp{s(ut + ow,)}, where s is a number. The exponent is the Brownian motion 
Ut + Owr, where the drift 7 = su and the parameter © = so. We apply to this process 
Proposition 3. In our case, the characteristic œ from this proposition is equal to & = ù + 
6/2 = s(u + s0? /2). Now we choose s for which & = 0, that is, we set s = —y, where 
y= 2u/0?. For the s chosen, by Proposition 3, the process W, is a martingale. 

If we multiply a martingale by a constant, the martingale property continues to hold 
(why?). Hence, the process Y, = e “W, = exp{—y(u + ut + ow,)} = exp{—yX;} is also a 
martingale. It is a counterpart of Y, from Section 2.4.3. 

Since 0 < X, <a fort < q, the values of Y, lie between 1 and e~’, and Condition 2 again 
holds. Finally, we have 


E{Y;} = E{Yo} =e™. 


On the other hand, E{Y;} = e~% pu +1-(1 — pu) = 1+ pule — 1). This readily leads to 


(2.4.8). W 
Route 2 => page 339 


328 5. RANDOM PROCESSES AND THEIR APPLICATIONS II 


2.4.5 The distribution of the ruin time in the case of Brownian motion 


Next, we consider the model of the previous section in the particular case a = œ. So, 
again X, =u+pt+ow;, © > 0, the initial point u € [0, °°), and t= min{t : X; =0}. However, 
now we solve a more difficult problem of finding the distribution of the r.v. t. We have 
known already that for u < 0, this r.v. is a stopping time, i.e., P(t < œ) = 1, while for 
u > 0, this is not the case because T = œ with a positive probability. So, t may be an 
improper or defective r.y. 

Set F, (t) = P(t < t). The index u indicates that this function depends on the fixed param- 
eter u. The function F,(t) has all properties of d.f.’s with one exception: if t is improper, 
then Fi,(0c) = P(t < œ) <1. 

Set f(t) = F/(t), the probability density of t. This function has all properties of densities 
with an exception that, if t is improper, then |p fu(t)dt = P(t < œ) <1. 

Let for z > 0, 


M,(z) = 1 "e AR. 


The function M,,(z) has all properties of m.g.f.’s except that if t is improper, then M,,(0) = 
P(t<o) <1. 

Note that for z > 0 we can write that M,,(z) = E{exp{—zt}} if we set, by convention, 
exp{—zt} = 0 for T = œ. 


Proposition 10 For © > 0, all u, and t > Q, 


1 
ft) = Toi z apf = (2uu + tu 4 ei). (2.4.10) 
u uyt 2u u il 
Flt) = ( Rena ) | exp ( ujo (- 5) (2.4.11) 
M,(2) = exp {—=5 (u+ Vir +2207) }. (2.4.12) 


Some comments. The fact that (2.4.10) is the derivative of (2.4.11) may be verified by 
direct differentiation. 

Since P(t < 0) = 0, we must have F,,(0) = 0. The reader is invited to check that, indeed, 
F(t) + 0 as t + 0. So, we have chosen a correct antiderivative of f,,(t). 

The reader can also verify that F,,(cc) = 1 for u < 0, and F, (œ) = exp (—2u) for u > 0, 
which coincides with (2.4.9). 

For u = 0,6 = 1, (2.4.11) becomes (1.1.8). (The distribution of the time when u + w; 
reaches zero is the same as for the time when w; reaches u.) 

It is interesting to check the value M,,(0). For u < 0, it is indeed one, since in this case, 


M,,(0) = exp f- (u+ vie) } = exp {—4 (u+ |u|)} = e° = 1. However, for u > 0, we 
have M,,(0) = exp {- sz (u + viž) \ = exp {— 5 2u}, which again coincides with (2.4.9). 


In Exercise 34, the reader is invited to explain why and how for u = 0 and © = 1 (2.4.11) 
leads to (1.1.8). 


2. Martingales 329 


Examples of direct applications of this theorem are given in Exercises 30-31. A less 
direct application concerning a hitting time is considered in the next section. 


Proof of Proposition 10. We restrict ourselves below to a short but somewhat non- 
rigorous proof in order to demonstrate again the usefulness of the martingale stopping 
property (2.4.4). In Section 2.5.3, we will prove (2.4.10) in a more direct and rigorous way 
making use of some particular fact concerning Brownian motion. 

As in Section 2.4.4, consider the process W, = exp{s(ut + ow,)}, where now s will play 
the role of a free parameter. We have seen in Section 2.4.4 that the characteristic œ from 
Proposition 3 in this case is & = su+s’o7/2. So, the process exp{—Gt + s(ut + ow;)} is 
a martingale. Multiplying it by e“, which does not change the martingale property, we 
come to the process Y, = exp{— Qt + su + s(ut + ow,)} = exp{—or + 5X;} which is also a 
martingale. 

The lack of rigor in the next step comes from the fact that we apply property (2.4.4) to the 
optional time t which for u > 0 is an improper (defective) r.v., that is, not a stopping time. 
Justification of this step requires some work and we omit it here. (One way is to provide 
calculations for the barrier a < œ and let a — œ% at the very end.) 

Since Xp = u, making use of the martingale stopping time property, we have E{Y,} = 
E{Yo}=e™. On the other hand, E{Y,}=E{exp{—at}}, since X,;=0 by the definition of T. 
Thus, E{exp{—at}} = e™, or 


E{exp{—(su+s?o7/2)t}} = exp{su}. (2.4.13) 


Now, we set su + s0? /2 =z 2 0, solve for s, and choose the negative root 


s= : (u HVL 4 2c0?) . (2.4.14) 


o2 


(If we had chosen the positive root, the r.-h.s. of (2.4.13) would have been larger than one, 
while the I.-h.s. would have been less than one.) 
From (2.4.13)-(2.4.14) it follows that 


E{exp{—zt}} = exp -5 (u+ V +2207) }, 


which is the m.g.f. of t. So, we have come to (2.4.12). 

The verification that (2.4.12) is indeed the m.g.f. (or Laplace transform) of (2.4.10) is 
lengthy but consists in pure integration, so we can turn to tables of integrals, for example, 
in [51]. At least, what we did completes the probabilistic part of the problem. 

As was mentioned, in Section 2.5.3, we will come to the same solution in a more direct 
way. WE 


2.4.6 The hitting time for the Brownian motion with drift 


Next, we connect Proposition 10 with the problem of hitting time. Now let X, = ut-+ow,, 
and let Ta = min{t : X; = a}, the time of hitting a level a > 0. It makes sense to emphasize 
that now we consider only one barrier. Assume that u > 0. Then X; is a process with a 
positive drift, and it will reach the level a with probability one. (Show that P(X; >a) > 1 
as t — co for any a.) 


330 5. RANDOM PROCESSES AND THEIR APPLICATIONS II 


The event {X, = a} = {ut + ow, =a} = {a — ut — ow, = 0}. By virtue of symmetry, 
the distribution of the process —ow, coincides with the distribution of ow,. Hence, the 
probabilities of all events we are considering will not change if we replace the process 
—ow; by ov. 

Thus, we may consider the first time f when the process a — ut + Ow; will reach zero 
level. This is the ruin time for the process X, = a — ut + Ow;. In order to use results of the 
previous section, in these results, we should replace u by —u and set u = a. Thus, we have 
arrived at 


Corollary 11 For the process X, = ut + Ow; above, 


aae] +e) -exp (25a) © ( sii me) (2.4.15) 


EXAMPLE 1. We generalize Example 1.1.3-1. As in this example, suppose that you 
own a stock whose current price is So = 100, and the price changes in time as the process 
S, = Soexp{ut + Ow; }, where the expected return u = 0.1 and the volatility o = 0.15. The 
difference is that now we consider a non-zero drift. You decided to sell your shares when 
the price has increased by 10%. Find the probability that this will not happen within the 
first year. 

You will sell your stock at the first time t when S, > 1.1Sọ. The last inequality is equiv- 
alent to exp{ut + ow;} > 1.1, or ut + ow; > a = In(1.1). It remains to use (2.4.15) with 
t = 1, u= 0.1,0 = 0.15. Calculations give P(t, < 1) 0.74, and P(t, > 1) ~ 0.26. 


2.5 Generalizations 
2.5.1 The martingale property in the case of random stopping time 


Now we turn to a proof of Theorem 6. However, we do not aim to prove it in full. Rather, 
our goal is to show that all its assertions are plausible and to demonstrate the main ideas of 
proofs. So, we will restrict ourselves to the discrete time case. 

As was mentioned in Section 2.4, Theorem 6 is a corollary of the fact that, under some 
conditions, the martingale property is preserved by optional stopping. In other words, under 
some conditions, the time s in the definition (2.2.2) of a martingale may be random. 

Let t =0,1,... , the process X, be a martingale, and Tt be an optional stopping time with 
respect to a basic process €,. For an event A, we denote by 14 the indicator of A, that is, 
the r.v. 14 taking a value of 1 if A occurs, and O otherwise. The symbol tA + stands for 
min{t,r}. 


Theorem 12 Assume that 
|E{Xz}| < æ, (2.5.1) 


and 
lim E {|X |L peor =0. (2.5.2) 


Then for any t, 
BAAN ENE Ko (2.5.3) 


2. Martingales 331 


A proof is given in Section 2.5.4. Now, we discuss the assertion of the theorem. 


A. Let Tt be non-random and be equal to a fixed s. Then, for t < s, (2.5.3) becomes the 
equation E{X,| &'} = X;, which is the definition of a martingale. For t > s, (2.5.3) 
gives E{X, | &'} = X, by the definition of conditional expectation. 


B. Setting t = 0 in (2.5.3), we have E{X,| Eo} = Xp. (If we set t = 0 in &, we get the r.v. 
éo.) Taking the expected values of both sides, we come to the martingale stopping 
property (2.4.4). 


Let us turn to conditions (2.5.1)-(2.5.2). 


C. To show that (2.5.2) may not hold, consider again the classical example with the 
doubling strategy. We saw in Examples 2.4.1-1 and 2.4.1-2 that for t > t the profit 
W, = —2' +1, and P(t >t) =27'. Hence, E{|Wi|lpesn} = (X — IE {I sn} = 
(2'—1)P(t >t) =1—2°~' +0. One may say that W, grows too fast. 


Consider now particular cases where (2.5.1)-(2.5.2) are true. 


D. Let Condition 1 of Theorem 6 hold, that is, T < some c. Since we consider the 
discrete time case, we may assume that c is an integer. For all t > c, the set {t > t} 
in (2.5.2) is empty, 1{r>1} = 0, and E{|X;|1j,5,; } = 0. In this case, Condition (2.5.1) 
holds because |X;| < max;<e |Xi| < |X1|+...+|Xc|, and the expectation of the last r.v. 
is finite since E{X;} is finite for all i. 


E. Let Condition 2 of Theorem 6 be true, that is, |X;| < some C for all t < t. Then, 
first, |X;| < C, and (2.5.1) is obviously true. Since 1;,3,; = 0 when t > t, we have 
E{ |X| lits} SCE gs} =CP(t >t) +0 as t— æ, because Tis a stopping 
time and P(t < œ) = 1. 


F. Verification of Condition 3 of Theorem 6 requires some work which we relegate to 
Section 2.5.5. 


2.5.2 A reduction to the standard Brownian motion in the case of ran- 
dom time 


Here we will generalize Proposition 2 from Section 1.2.2 for the case of stopping times. 

Let X, = ut + w; (as in Section 1.2.2, we set, for simplicity, © = 1). We again use the 
notation X‘ and consider a function g(X’) as a function of the whole trajectory X,, 0<u<t. 
If we write g(X*), where t is an optional time, we mean the whole trajectory until the 
random time qt. For example, g(X*) may equal maxg<,<1Xu. 

Denote by Eo{g(X*)} the expectation in the case u = 0, that is, in the case of the standard 
Brownian motion. 

We do not exclude the case when P(t = œ) > 0, but we assume that E{g(X*)} and 
Eo{g(X*)} are finite. 


Proposition 13 Let t be an optional time. Then 


E{g(X")} = Eo {g(X*) exp{uX,— tr /2}} . (2.5.4) 


332 5. RANDOM PROCESSES AND THEIR APPLICATIONS II 


(Compare with (1.2.5).) 
Proof. We provide it only for the case when T is a discrete r.v. taking some values t ,t2... . 
In this case, we can write that 


E{g(X*)} =e Jln} } = DEI "1 ray} }- (2.5.5) 


Since the event {T = tg} depends only on the trajectory of the process until time Tg, the r.v. 
g(X")1,,_,,} is a function of X* . Then, by (1.2.5), 


EAR} = Eo f e0 Oez) xP (HM, — Sta yh. 


Together with (2.5.5), it implies that 


E{g(X")}= fat (pre n8 (X*) exp{uX, — sti a) | = Eo {e(X" exp (uk, — yah m 


2.5.3 The distribution of the ruin time in the case of Brownian motion: 
another approach 


In this section, we demonstrate how the last proposition can help to solve the problem of 
Section 2.4.5. 

Setting, for now, the variance parameter © = 1, consider the process X; = u + ut + w, and 
the r.v. t = inf{t : X; = 0}. Note that t may be defective. Let S; = a +w. Then X, =u +S, 
and t = inf{t : S, = —u}. Thus, by definition, X, = 0 and S, = 

Being slightly non-rigorous, consider the event A; = {t € $ i dt|}, viewing dt as an 
infinitesimal increment of t. To justify such an approach, one should consider an interval 
[t,t + 6] and apply the limiting argument, letting 5 > 0. 

The process $, is a Brownian motion with drift, and we can apply Proposition 13 to 
functions of this process. (We cannot apply this proposition directly to X;, since it is a 
shifted Brownian motion.) We have 


P(A;) = E{14, } = Eo {14, exp{uSt — Tu" /2}} = Eo {1a, exp{u(—u) — we/2}}. (2.5.6) 


Since the r.v. inside Eo{-} is equal to zero if t ¢ [t,t +-dt], and since dt is infinitely small, 
we can set T = t, which implies that 


P(A,) = exp{—pu — tu? /2}Eo {1a,} = exp{ -uu — tu? /2} Po {Ar}, 


where the probability Po(A;) = Po(t € [t,t +dt]) corresponds to the case u = 0. In the 
last case, $, = w; and T = inf{t : S, = —u}, the first moment when the standard Brownian 
motion reaches level (—u). In view of the symmetry of w;, we can replace —u by u and use 
(1.1.8), which gives us the d.f. of t in this case. Thus, 


Po(t € [t,t +dt]) = £ (2(1 —(u/vt)) dt = s7 exp {—u? /2t}dt, 


1 
V2 


where @ is the standard normal density. 


2. Martingales 333 
Combining it with (2.5.6), we come to the density of T: 


u 1 
MO = ar exp i 


which coincides with (2.4.10) for o = 1. 

To consider the case of an arbitrary ©, it suffices to notice that in the general case, X, = 
u+pt + ow; = 0(4 + §t +w), and X; = 0 iff # + £t +w, =0. Thus, to obtain the formula 
for the general case, it suffices to replace u by u/o and u by w/o in (2.5.7). This leads to 
(2.4.10). 

We saw in Section 2.4.5 how to arrive at (2.4.11). As to the m.g.f. (2.4.13), one can 
derive it from (2.4.10) by direct (although complicated) integration, but the technique we 
used in Section 2.4.5 is certainly more simple. Comparing two methods—from this section 
and Section 2.4.5—-we see that the former is more convenient for deriving the distribution, 
and the latter—for the m.g.f. I 


(2p + tyr 4 ey), (2.5.7) 


2.5.4 Proof of Theorem 12 


If t <t, the value of the r.v. X, is completely determined by &', and hence the 1.-h.s. of 
(2.5.3) 
E{X, |S} = Xr = Xr. 


That is, (2.5.3) is trivial in this case. Consider the case T > t. 

Let an event A = {& € B}, where B is an arbitrary set of values of &. (More precisely, an 
arbitrary set the probability of which is well defined; we skip here formalities concerning 
this issue.) Proceeding from the definition of conditional expectation, we should prove that 


E{X,;;AN{t > th} = E{X, sAn{t rh}. (2.5.8) 
Since we consider only the discrete time case, we can write 
E{X,;AN{t> th} =E{XsAN{t=th} +E{XsAn{t> tH}. 


Clearly, E{X,;AN{t=t}} = E{X1;AN {T =t}}. The event AN {t > t} is completely 
determined by values of €’. By the martingale property, X, = E{X;+1 |6}. Hence, 


E{X,;AN{t>t}} = E{E {X41 |G} AN {T > th} = EL E{X 41 |E } Manges} 
= ELE{X lanpo |S} = E {Xm Manges} = E{X danger iy} 
= E{X AN {t> t+ 1h}. 


Thus, 
E{X,;AN{t>th} = E{XsAN{t=t}} + E{X4;AN{t>Sr+1}}. 


The second term on the right is the term on the left with ¢ replaced by t+ 1. So, repeating 
the argument m — t — 2 times, we have 


334 5. RANDOM PROCESSES AND THEIR APPLICATIONS II 


E{X;AN{t>t}} = E(x AN{t=t}}+E{X,;An{t=r+4+1}} 
+E{X42; AN {T > t+2}} =E{X AN {t <tT< t+ 1} +E {X42 AN {T> 1+2}} 
=... =E{X;AN{t <t<m—1}} +E {Xm; AN{t > m}} 

= E{X,;AN{t<t<m—-I1}}+E{X,;AN{t=m}}4+E{Xn; AN {t > m}} 

= E{X,;AN{t <1 <m}}+E{Xm, AN {T > m}}. 


From this it follows that 
E{X,;AN{t <t<m—-I}} = EX, sAN {tS t}}-—E{X,;AN{t> m}}. (2.5.9) 


The term on the left, E{X,;;AN {t < tT < m-—1}} > E{X,;AN {t < th} as m > œ, 
because E {X,} exists and is finite. For the second term on the right, we have 


|E{Xm; AN {T > m}}| < E{|Xn|;t >m} — 0 as m > ©, 
by (2.5.2). So, (2.5.8) has been proved. W 


2.5.5 Verification of Condition 3 of Theorem 6 


We should show that if the condition mentioned holds, then the conditions of Theorem 
12 are also satisfied. 

For simplicity and without loss of generality, let Xo = 0. Then X; = Z; +...+Z;. Let 
Y; = 1,5. The main idea of the proof is to write 


X: = ee Zi = ye YZ 


since once i > T, the terms in the sum vanish. Observe that whether Y; = 1 or not depends 
on the values of &'~!, that is, the sequence Y; is predictable. Then, by Proposition 5 from 
Section 2.3, W, = Yi_, YiZi is a martingale. Let us set W, = Y/_, Y;|Z;| and note that the 
r.v.’s W; are increasing in t. The proof is based on the bound 


E{W,} < CE{t} for all t, (2.5.10) 


which we will prove at the end. Assume that (2.5 10) is true. a 
Then, there exists a proper (non-defective) r.v. W = lim; W; = £71 Y;|Z;|, and 


E{W} =E {lim W,} =limE {W} < CE{t} < œ. 


(We can pass the limit sign through the expectation by the theorem on monotone con- 
vergence which says that we can do that if the sequence of r.v.’s under consideration is 
monotone. See practically any advanced text-book on integration, or Probability Theory; 
say, [27, p.86], [120, p.113], [129, p.186].) 
Secondly, T 
E{|Xl} < E {£Z YZ} = EW} < %, 


which yields (2.5.1). 


3. Exercises 335 


Now, note that the expression on the L.-h.s. of (2.5.2) 


E{|X Lr} } < E{{} [Zil ir>} < E{{y [zl Hisn} 
= E{{} i YZ psy} =E{W i>} 


The last expression vanishes as t — œ by the Lebesgue dominated convergence theorem 
which says that, if r.v.’s Na — n with probability one and |n,| < v where v is a r.v. with a 
finite mean, then E{n,} > E{n}. See, e.g., [27, p.100], [120, p.114], [129, p.187]. 

In our case, Win <W, and Wes — 0, as t > œ, with probability one since 
P(t < œ) = 1. Thus, E{|X;|1,,5,; } — 0, and (2.5.2) is true. 

It remains to prove (2.5.10). Let x41 = E{|Zi+1| |§'}. Then 


t t t 
E{W,} =E fEng =E [En = “| +E [Eral ; (2.5.11) 
i=1 i=1 i=1 

Recalling that Y;,; is a function of Ẹ', we have E {Yi41 (Ziil — 141) |6 pe Y; E {(\Zi41| 
=mi) |E} = Yia (E {|Zml lE} — 241) = 0. Hence, E {Y;+1(|Zi+1|— 24:41)} = 0, and 
the first term in (2.5.11) is zero. 

To estimate the second term, note that all terms in it are non-negative and, by Condition 
3, mi < C. Also, Y=" | Y; = T by the definition of the Y;’s. Consequently, 


fýr) <ce| Ent =CE {t}. m 
i=l i=1 


3 EXERCISES 


Section 1 


1. Provide an Excel worksheet illustrating the invariance principle of Section 1.1.2. To do this, 
simulate values of some AAD T r.v.’s, for example, exponential or uniform, construct 
sums Sz, and then the process xe ) (consider only points t = k/n). Provide charts with the 


(n) 


graphs of particular realizations of X; 


2. Find P(w; +ws < 4), where w; is the standard Brownian motion. 


; ; : : . d 
3. Let w;; and wp be independent Brownian motions. Find o for which wa + 2w,2 = ow;. 


(Here, as usual, the symbol X L Y means that the distribution of X is equal to the distribution 
of Y. The r.v.’s X and Y themselves may be not equal to each other.) 


4. Continuing Exercise 3, consider the process x; = 01 wy] + ©2W;2, where 01,62 are numbers. 
Let 0? = o? + o2. Show that the process x;/o is a standard Brownian motion, and hence 
x; may be represented as ow;, where w; is a standard Brownian motion. Next, consider the 
case 6; = — 1,62 = 0. Explain why the fact that —w, is Brownian motion is almost obvious. 
(Advice: First, show that x; is the process with independent increments. Secondly, consider 
the distribution of x,.) 


336 5. RANDOM PROCESSES AND THEIR APPLICATIONS II 


5. Let w; be a Brownian motion, and let a be a fixed parameter. Show that the process x, = 
Wra//a has the same distribution as wz. 


6. A physical or economic process evolves as w;. You decided to measure time in a different 
unit setting t = as, where a is a fixed scale parameter and s is time in the new unit. Argue that 
you cannot model the basic process by ws. Find c for which the process cws has the same 
distribution as the basic process. 


7. Assume that in the situation of Example 1.1.3-1, you decided to sell the stock not when the 
price increases by 10%, but when it drops by 10%. Will the answer change? If yes, find it. If 
no, justify the answer. 


8. The prices for two stocks evolve as the processes S;; = 10exp{61w;1 } and S;z = 11 exp{o2w;2}, 
where w;; and wp are independent Brownian motions, 6; = 0.1, and 62 = 0.2. What are the 
initial prices for the stocks? Find the probability that during a year the price S,; will meet 
(catch up) the price S;2. (Advice: Use the result of Exercise 4.) 


9. Find Corr{w,,ws} and show that it vanishes as t — œ% for any fixed s. (Advice: Use the fact 
that w, is a process with independent increments.) 


10. Show that w, and |w;| have the same distribution. 


11. Show that iW Z, O as t > ©. In other words, Ww, is growing slower than t, or W, = o(t). Show 
that this obviously implies that w, = o(t). In general, for which functions g(t) can we say 
that w, = o(g(t)) in probability? 


12. (a) Compute the probability density of the r.v. Ta from Section 1.1.3. Show that E {T4} =. 


(b) Proceeding from the invariance principle, connect heuristically this fact with the null 
recurrence of states in the symmetric random walk (see Section 4.4.5.3). 


13. Prove that E{w,} = ,/2t/m. Show that to obtain this answer, it suffices to consider the case 
t = 1. Is the expected value of the maximum value of Brownian motion on the interval [0,2] 
twice as large as that on the interval [0, 1]? 


14. Two types of customers are calling an insurance company. Each customer is equally likely 
to belong to each type, and the type of the next customer does not depend on the previous 
history. We are interested in the difference between the numbers of calls from customers of 
the first and second type. (Note that the difference so defined may be negative.) Estimate, 
using the Brownian motion approximation, the probability that during first n = 100 calls the 
difference mentioned will never exceed the level 10. (Advice: When applying the invari- 
ance principle from Section 1.1.2, set € = +1. Also, the inequality maXg<n Sk < b may be 
rewritten as maXk<n(Sk/ vn) < b//n, and it is reasonable to set b = ayn.) 


15. Using results on the log-normal distribution from Section 2.1.1, find E{Y,} and Var{Y,} for 
the geometric Brownian motion Y, defined in Section 1.3. 


Section 2 


16. Explain why the r.v.’s Z, in (2.2.7) are martingale differences with respect to &,, with respect 
to themselves, and with respect to X’, as well. 


17. Show that any process € with independent increments such that E{€,} = 0 for any interval 
A is a martingale. Consider, as an example, w,;. Explain why the assertion of this exercise is 
formally more general than the assertion of Example 2.2-2 (though is very close). 


18. Let N, be a non-homogeneous Poisson process from Section 4.2.2.1. Is N, a martingale? Is 
the process Z; = N; — E{N,} a martingale? (Advice: Revisit Exercise 17. Look up (4.2.2.1).) 


19. 


20. 


21. 


22. 


23. 


24. 


295: 
26. 


27. 
28. 


29,** 
30.** 


3. Exercises 337 


Show that any process X, such that E{X(,,,3)|X‘} =0 for any ¢ and ô > 0 is a martingale 
with respect to itself. 


In the discrete time case, let X; = Z1 +... + Z; and Z, = &1 -...-& , where &’s are independent 
and E{&;} =0. Are Z’s independent? Show that X; is a martingale. 


Show that the following sequences are martingales: (a) X, = 2’ exp{—(&; + ...+&-)}, where 
&’s are independent and standard exponential; (b) X; = 2'61 -...-&;, where €’s are independent 
and take on values 0 and 1 with equal probabilities. What is lim;_,.. X; in both cases? (Advice: 
In the first case, use the fact that, by the LLN, (6; +...+6&,)/t > 1 with probability one. In 
the second case, realize how long there will be no zeros among &’s.) 


By analogy with Example 2.2-5, consider the n-step binomial tree for a price process S;, 
t =0,1,...,n, with the following property: the initial price is 100 and in each of n time steps, 
the price either grows by 20% or drops by 10%. Show that there are exactly 2” paths that the 
stock price may follow. Specify a probability measure P for which S; is a martingale. What 
is the probability of a particular path? (Hint: In a certain sense, this problem is simpler than 
that in Example 2.2-5 since the rates of growth or fall are the same for each node of the tree. 
However, for different paths, the numbers of steps where the price goes up may be different. 
Once you solve the problem, you will understand better why this model is called binomial.) 


Let T be an optional time with respect to a process €;. Is the occurrence of event {t > t} 
completely determined by the values of the r.v.’s €0,...,&; ? 


In the situation of Example 2.4.2-1, explain why it takes considerably long to reach the level 
1 starting from the neighbor level 0. Show that E{t,} > aif p < 1, not appealing to formula 
(2.4.6). What will happen if p gets closer to 1/2? 


In the situation of Example 2.4.2-4, prove that E{ta} = œ% if m = 0. 


Consider the doubling strategy from Example 2.4.1-1 in the case of a game of roulette. As- 
sume that a player always bets on red. Then at each play, the probability of success is 
p = 9/19 (there are 18 red, 18 black, and 2 green cells). Is W; a martingale? Will the player 
win 1 with probability one, if there is no maximum bet and provided that the player has an 
infinitely large capital? Assume that in a casino, the minimal bet is $5, the maximal bet 
is $500, and the player starts with the minimal bet. What is the probability that the player 
will fail to run the doubling strategy? How much will she/he lose in this case? Suppose 
that a professor, when teaching the martingale theory, considered in his class of 100 students 
the doubling strategy as an example, and after the lecture, all students rushed to the casino 
(described above) to apply the doubling strategy. Find the probability that at least one stu- 
dent will lose. Proceeding from your answer, argue that the professor had to compute this 
probability in class. (Advice: A good idea is to use the Poisson approximation.) 


Find the limit of (2.4.8) as u — 0 and o is fixed. Interpret the answer. 


Suppose the surplus process for a risk portfolio is well approximated by the Brownian motion 
with drift; in other words, the process is X; = u + ut + odw; from Section 2.4.4. The ruin 
probability for a certain choice of parameters occurs to be i: How will this probability 
change if (a) the parameter o is doubled; (b) the process X; is multiplied by 2; (c) the initial 


surplus is doubled? 
Consider (2.4.10)-(2.4.12) for u = 0. Interpret the answer. 


The level of a water reservoir changes accordingly to the process X, = 1 +2t +3w,, where 
wy is a standard Brownian motion. Find the probability that the reservoir (a) will never be 
empty, (b) will not be empty during two units of time. 


338 


i i 


32.8% 


33.** 


34% 


5. RANDOM PROCESSES AND THEIR APPLICATIONS II 


Let us interpret the process X, from Exercise 30 as the surplus process of a risk portfolio. 
Can the coefficient 2 be interpreted as a premium per unit of time? 


Assume that in the situation of Example 2.4.6-1, you decided to sell the stock not when the 
price increases by 10% but when it decreases by 10%. Will the answer change? If yes, find 
it. If no, justify the answer. Is the stopping time proper (non-defective) in this case? Find the 
probability that the price will never drop by 10%. 


The prices for two stocks change as the processes S; = 10exp{u1t + O1w;1} and Syo 
11exp{uzt + 02W;,2}, where w; and w,2 are independent Brownian motions, uy) = 0.15, u2 = 
0.11, 6; =0.15, and 62 = 0.1. What are the initial prices for the stocks? Find the probability 
that during a year the price S,; will meet the price S;. 


Explain why and how for u = 0 and o = 1 (2.4.11) leads to (1.1.8). 


Chapter 6 


Global Characteristics of the Surplus 
Process. Ruin Models. 
Models with Paying Dividends 


The purpose of this chapter is to present and explore some global characteristics of the 
surplus (or reserve) process. The characteristics we consider are connected, in some sense 
or another, either with the profitability of insurance operations or with their viability, i.e., 
the degree of protection against adversity. 


1 A GENERAL FRAMEWORK 


Given a particular portfolio, we define the surplus process R, as the monetary fund “on 
hand” at time t. Merely for illustrative purposes, we may view œR, as the capital at time ¢ if 
we keep in mind that it is, of course, not the whole capital of the company but rather the 
available reserve of high liquidity corresponding to the portfolio under consideration. 

In some results below, we consider the surplus process R, as a whole, without specifying 
its interior structure. However, in most cases, we assume 


R, =u +c — Si, (1.1) 


where u is a fixed initial surplus, c, is the aggregate amount of the positive cash flow by 
time ft, and S(,) is the corresponding loss process. In this section, we mostly view c; as the 
total premium collected by time t, and S(p as the aggregate claim paid by the same time 
t. In a more general model, c, may include results of investment, and S(,,—some other 
expenses different from claim coverage. 

As in previous chapters, when adopting model (1.1) in the continuous time case, we 
usually set 


N; 
Si) = LX, (1.2) 


where N, is the process that counts consecutive claims, and X; is the amount of the ith claim. 
As to premiums, in a typical model, c; = (1 +®)E{S()}, where 6 is a relative loading 
coefficient. In particular, if N, is a Poisson process with rate A, and m = E{X;}, then as we 
saw in Chapter 4, F {S()} = màt, and hence c; = (1 +®)mdr. 
In this case, a typical realization of the surplus process looks as in Fig.1. The process 
grows linearly, and at the random moments of claim arrivals, the process drops by the 
amounts of claims. 


339 


340 6. GLOBAL CHARACTERISTICS OF THE SURPLUS PROCESS 


FIGURE 1. A typical realization of the surplus process; 
T; is the moment of the ith claim arrival. 


Subject to certain regulations and conditions, the company can choose a premium and 
an initial surplus that the company considers reasonable. In doing so, it proceeds from 
its goals, which in turn are determined by quality criteria that the company establishes for 
itself. 


If R; is viewed as a profit, one of possible criteria is the sum 


T 
$ E{g(Re—Re-1)}, 
k=l 


where T is the time horizon from which the company proceeds, and g(-) is a utility function. 
(Since u stands for the initial surplus, we use a symbol different from that from Chapter 
1.) Time may be discrete or continuous, and E{g(R; — Rx_1)} is the expected utility of the 
profit during the kth period. Note that the “profit” Ry — R,_; may be negative. 

Another criterion is the expected utility of the profit at the “final” time T, that is, E{g(Rr)}. 
Instead of the expected utility criterion, one may consider more flexible criteria from Sec- 
tion 1.4. 

A different approach is connected with paying dividends. We use here the term “divi- 
dend” for brevity and understand it in a broad sense: it may concern real dividends paid to 
stockholders or an amount taken from the reserve for other purposes, say, for investment. 

Consider the discrete time case, and denote by d; the dividend paid at time t. (Certainly, 
we do not exclude the case d; = 0.) Then, instead of (1.1), for the surplus at time t we write 


R; =u + cr — Si) — Dr, (1.3) 


where D, = dı + ... + d;, the aggregate amount of dividends paid by time t. 

The choice of a dividend to be paid at time t may (and should) depend on the current 
situation. So, in general, d; is a random variable depending on the realization of the process 
until the time ¢ and the strategy of paying dividends. For example, if the time horizon is 
large and the initial surplus is low, it may prove to be reasonable to pay fewer dividends in 
the beginning in order to avoid ruin, to let the cash process grow and to be able to pay more 


1. A General Framework 341 


The moment of ruin 
t 


FIGURE 2. A realization of the surplus process in the case of ruin; T is the time of 
ruin. For the particular realization above, the ruin has occurred at the moment of the 
fourth claim’s arrival. 


in the future. A natural criterion here is the expected discounted total payment, namely, 


T 
E Evah, (1.4) 
t=1 


where v is a discount factor (see Section 0.8.3). The problem consists in finding a strategy 
maximizing (1.4). We consider this problem in Section 3. 

Criteria of another type appeal to the viability of insurance operations, which amounts to 
keeping the surplus at a proper level. One important example is 


P(R; > k, forall t <T), (1.5) 


where T is a time horizon, and k; is a given level for the surplus at time t. The formula 
above concerns both discrete and continuous time cases. 

The goal of the company in this case is either to maximize (under some natural con- 
strains) probability (1.5), or to keep it higher than a given security level. 

The simplest and most frequently considered case is that of k; = 0. In this case, a tradi- 
tional notation for the probability (1.5) is or (u), so 


or (u) = P(R, >0 forall t<T), (1.6) 


the probability that the portfolio will be solvent during the period [0,7]. The initial surplus 
u is presented explicitly in the notation to emphasize that the probability under considera- 
tion depends on u. 

As in previous chapters, we call the quantity 


Wr(u) =l|- or (u) (1.7) 


the ruin probability regarding the finite time horizon T. This is the probability that the 
surplus process R, will assume a negative value during the period [0,7]; see also Fig.2. 
Sometimes, (1.6) is called a survival probability. The term “survival” is traditional but 
it is important to emphasize that the same term is used in Demography and in the life- 
insurance modeling in another sense. Namely, in these areas, a survival probability is the 


342 6. GLOBAL CHARACTERISTICS OF THE SURPLUS PROCESS 


probability that an individual (or a machine) will attain a certain age. We will consider the 
corresponding theory in Chapter 7. 

To avoid confusion, in the context of Ruin Theory, we will also call survival probability 
a no-ruin probability. 

For T = œ, we set 


o(u) = P(R; > 0 forall t< oe) and y(u) = 1- ẹọ(u), 


and call these two quantities infinite-horizon no-ruin (survival) and ruin probabilities, 
respectively. We will omit the adjective “infinite-horizon” when it does not cause misun- 
derstanding. 

In the actuarial literature, no-ruin and ruin probabilities are often denoted by 6(w) and 
W(u) in the discrete time case, and by (wu) and y(u) if time is continuous. When it cannot 
cause misunderstanding, we will use the symbols ọ and y in both cases since quite often 
we treat both cases simultaneously. 


The probability y(u) is one of the main objects of study in the next section. The models 
we consider are, to some degree, idealized and do not reflect all main features of real 
surplus processes. For example, we do not touch on such important issue as investment 
income. So, the corresponding results cannot be viewed as direct instructions for decision 
making. These results provide rather some useful information about the behavior of the 
insurance process, and may (and should) be taken into account together with other factors 
of the insurance business. In particular, the ruin probability should be viewed as one of the 
possible characteristics of the riskiness of the insurance process. 


Regarding ruin models, one may also encounter the following reasoning. Ruin models do not 
take into account that the company invests its collected premiums and, as a result, the capital grows. 
On the other hand, these models do not take inflation into account. Since these two factors acting in 
the opposite directions may compensate each other, ruin models may occur to be more adequate to 
reality than they seem at first glance. 


2 RUIN MODELS 


We already considered ruin probabilities in the simple random walk scheme (Sections 
4.4.3.2 and 5.2.4.3) and for the Brownian motion with drift (Section 5.2.4.4). In this 
section, we consider models that are closer to real situations and to some degree more 
sophisticated. 

Computing ruin probabilities, especially for a finite time horizon, is a difficult problem. 
There are skillful direct computational methods, though nowadays their significance is de- 
creasing. Simulation of insurance processes, especially if it is carried out with the use of 
powerful computers, may lead to better accuracy than direct calculations do. But in this 
case, qualitative analysis that helps to see a general picture becomes even more important. 

We start in the next section with a general theory and estimation of ruin probabilities. 
First, we consider the most known (and simplest) result—Lundberg’s inequality, and after 


2. Ruin Models 343 


that we turn to a more general theorem and various applications. In Sections 2.1-2.5, we 
assume œR; to be a process with independent increments. In Section 2.6 belonging to Route 
2, we weaken this condition assuming R, to be a martingale. In Subsection 2.8, we will 
touch briefly on some computational aspects and different approaches. 

Unless stated otherwise, this section concerns ruin problems with an infinite horizon. 


2.1 Adjustment coefficients and ruin probabilities 
2.1.1 Lundberg’s inequality 


As above, let R, be a surplus process, and u = Ro, the initial surplus. Assume, as usual, 
that R; is a process with independent increments. 

It is also convenient to define the claim surplus process W, = u — R,. In particular, if (1.1) 
is true, then W, = S (t) Cts the total claim minus the total premium, and W, does not depend 
on u. 

For a time interval A = (t,t +s], we set Wa = W;+s — W,, the increment of the process W, 
over A. Denote by Ma(z) the m.g.f. of Wa. 

To avoid superfluously complicated formulations, we assume from the very beginning 
that for any A, 

P(Wa =0) #1, andhence Ma(z) #1. (2.1.1) 


(If Wa = 0 with probability one, then with the same probability the premium exactly equals 
the payments, which is certainly a non-realistic and trivial case.) 
We call a number y > 0 an adjustment coefficient if for any A, 


My(y) = 1. (2.1.2) 
Remarks: 


1. First, we comment on the very definition of an adjustment coefficient. Certainly, 
(2.1.2) is true for y= 0. The above definition presupposes that under some condi- 
tions there exists a positive y for which (2.1.2) also holds. Detailed analysis will be 
provided later, and for now we just clarify the significance of the above definition. 


Let us consider, first, an arbitrary r.v. € and denote by u and M (z) its mean and m.g.f., 
respectively. For the sake of simplicity, suppose for now that M(z) is defined for all 
z > 0, and assume also that P(§ = 0) 4 1. (Otherwise, M(z) = 1, and the situation is 
trivial.) Since M(z) Æ 1 and convex, M (z) may equal one at most at two points, and 
one of them is z = 0. Denote by p another, if any, solution to the equation M(z) = 1. 


As we know, M(z) is convex, M(0) = 1, and M’(0) = u. Hence, if u > 0, then the 
function M(z) is non-decreasing, and a typical graph of M(z) looks as in Fig.3a. In 
this case, a positive solution p does not exist. 


If M'(0) = u < 0, then starting from one at z = 0, the graph of M (z) “goes down at 
least for a while”; see Fig.3bc. Since M(z) is convex, two situations may take place. 
Either the graph looks as in Fig.3b, and a solution p > 0 exists and is unique; or the 
graph looks as in Fig.3c, and a finite positive solution does not exist. In this case, by 
convention, we set p = œ. We will see below that it is possible only if P(E < 0) = 1. 


344 


6. GLOBAL CHARACTERISTICS OF THE SURPLUS PROCESS 


M(z) M(z) 


FIGURE 3. (a) The case M'(0)=u>0. (b) The case M'(0) = u < 0 and a finite p. 


(c) The case M'(0) = u < 0 and p =. The broken lines are tangent to M (z). 


In our particular case where € = Wa, among the three possibilities mentioned, only 
the second may be viewed as realistic. Indeed, if the mean claim surplus E{Wa} > 0 
for all A, this may be viewed as “too bad”, and there is no reason for the company to 
function. As we will see, in this case the ruin probability equals one. On the other 
hand, if P(Wa(z) < 0) = 1, for the company it is “too good” to be true: in this case, 
clients always pay not less than the company pays them. Later, we will consider all 
of this in more detail. 


. It may seem non-plausible that (2.1.2) may be true with the same y for all A. As a 


matter of fact, as we will see, the point here is that as a rule 


Ma(z) =exp{qi(A)q2(z)}, (2.1.3) 


where qı and q2 are separate functions of A and z, respectively. So, we can try to find 
y for which q2(y) = 0, and then (2.1.2) will be true for all A. 


. The reader who did not omit Section 5.2 on martingales or who is familiar with this 


notion, may notice that together with the independent-increments condition, (2.1.2) 
implies that the process Y, = exp{yW;} is a martingale. We show this in detail in 
Section 2.6. We will also see in this section that, as a matter of fact, for subsequent 
results to be true, we need only Y, to be a martingale, and the independence of incre- 
ments is not necessary. 


We proceed to results. Let y(u) be a ruin probability as it was defined in Section 1. 


Proposition 1 (Lundberg’s inequality). If the adjustment coefficient Y exists, then the 
ruin probability 


y(u) < exp{—yw}. (2.1.4) 


More remarks: 


4. This famous inequality gives an estimate for the ruin probability with some leeway: 


we will see in examples below that, as a rule, the real ruin probability is less than the 
r.-h.s. of (2.1.4). But the estimate is simple and tractable and has the advantage that 
the total information about the process is accumulated in one parameter y. In next 
sections, we consider many examples. 


2. Ruin Models 345 


5. If we face the situation illustrated in Fig.3a, we can set y= 0 in (2.1.4). The inequality 
will become trivial (the r.-h.s. equals one), but will be still true. 


6. If for any A, the claim surplus W4 (z) < 0 with probability one, then ruin is impossible. 
It is also reflected in (2.1.4): in this case, we face the situation illustrated in Fig.3c 
(details will be provided later), so we can set y = œ, and the r.-h.s. of (2.1.4) equals 
zero for any u > 0. 


7. This remark is important and concerns the initial surplus u. It is reasonable to view it 
not as the initial surplus at the time when the insurance process had started, but rather 
as the current surplus at the present time. At each time moment, we can recalculate 
the ruin probability, depending on the amount of the surplus at the current time. 


EXAMPLE 1. Assume that the loss process S(,) is a homogeneous compound Pois- 
son process with unit rate and claims having the standard exponential distribution. 
Let c; = (1+6)t, where 6 is the security loading coefficient. As we will compute it 
in Section 2.5.1, in this case, the adjustment coefficient y = 15 and the ruin proba- 


bility itself is 
Te 1 m Ou 
PS TEE PI TEO 


Suppose 6 = 0.1 and the company has started the corresponding insurance business 
with an initial surplus of 30 units of money. Then the ruin probability y(30) = 
a exp{—0.1-30/1.1} ~ 0.059. Suppose that during some period the company has 
been lucky and the total premium collected turned out to be 5 units larger than the 
total payment. So, the current surplus became equal to 35 units. In this case, we may 
forget the previous estimate. Now, the company is in a better position, and the new 
ruin probability equals y(35) = an exp{—0.1-35/1.1} ~ 0.038. 


In this context, one more thing is noteworthy. Certainly, if the current surplus be- 
came larger than the initial, the insurer may take some money from the surplus, for 
example, paying dividends, and return to the initial amount. However, we should 
understand that in this case, the situation will have changed. Lundberg’s inequality 
is true under the assumption that the current surplus will not be “touched”. But if we 
suppose that in the future we may change the fund “on hand”, then such a supposi- 
tion should be taken into account in calculating the ruin probability. In this case, we 
face another situation, namely, that of paying dividends. We consider such a scheme 
in Section 3. 


2.1.2 Proof of Lundberg’s inequality 
First, note that, by definition, Wọ = 0, and hence, Wot} = W,. Since (2.1.2) is true for any 
A, for any T > 0, 
1 =Mozr\(¥) =E{e™} = E{e™" |t < T}P(t<T)+E{e™ |t>T}P(t>T) 
> Efe |t<T}P(t<T) = Efe el) t< T}P(t<T). 


346 6. GLOBAL CHARACTERISTICS OF THE SURPLUS PROCESS 


Since by definition R, < 0, we have W; = u — R, > u. Consequently, if we replace e™ by 
e™, then the resulting expression will not get larger, and we may write that 


1 > EfeMeWr—W) |t < T}P(t<T) =e" Ee W) |p <TIP(T<T). (2.1.5) 


Since R; is a process with independent increments, so is W;. Hence, given that tT equals 
some s < T, the distribution of the rv. Wr — Wz: is equal to the distribution of the r.v. 
Wr — W,, and does not depend on the values of the process W, for t < s. Then 


Efexp{y(Wr —W;)}|t=s < T} =Efexp{yW r} |T =s <T} =Efexp{ywy rj} =1 


by the same property (2.1.2). Let F (s) = P(t < s|t < T), the conditional d.f. of t. Then, 
by the formula for total expectation, 


T T 
Ef{e(™r-W) |g <T} = 1 Eje Ww) |q = 5 <T}dF (5) = | | -dF(s) =1. 
0 0 


From this and (2.1.5) it follows that 1 > e™P(t < T), and hence P(t < T) <e-™. The 
r.-h.s. of the last inequality does not depend on T. So, we can write that y(u) = P(t < 
œ) = limmo P(T < T) < e". m 


2.1.3 The main theorem 


Particular examples of the applications of Lundberg’s inequality will be considered in 
Section 2.2, but first let us state a theorem that gives a precise presentation of ruin proba- 
bility. 

To this end, we need one more formal and very mild condition. Namely, since we con- 
sider an infinite horizon, we need an assumption concerning the behavior of the process at 
infinity. Loosely put, we assume that for large time horizons the aggregate surplus should 
be large. Formally, we require that 


Rite +œ as t>o (2.1.6) 


(for this type of convergence see Section 0.5). 

It is worthwhile to emphasize two things. First, (2.1.6) does not concern the ruin is- 
sue, and when imposing this requirement, we do not exclude that the process may take on 
negative values “on the way to infinity”. 

Secondly, this formal assumption holds in all reasonable models of insurance processes 
without paying dividends, including all particular models we consider in this book (except- 
ing models with paying dividends in Section 3). So, in the first reading, the examples in 
this subsection may be even omitted. 

Note also, that the condition (2.1.6) is strongly connected with the LLN. For example, 
let R; = u + cr — S(r), where Sj) is the aggregate claim, the premium c; = (1+ QELS}, 
and 6 is a positive relative loading coefficient. Then 


R, = u+E{S@)}— [Sg —E{Sw}] (2.1.7) 


and (2.1.6) will be true if E{S()} — œ (the aggregate claim is large on the average for 
large t), and the third term in (2.1.7) is small with respect to the second (the deviation of 


2. Ruin Models 347 


the payment Sq) from its expected value is smaller than the expected value itself). More 
precisely, this means that for any arbitrary small € > 0, 


P (|S) —E{S(}| > EE {Si }) > 0, as t > œ. (2.1.8) 
This is just another form of the LLN. By Chebyshev’s inequality (see (0.2.5.3)), 
P (\Sq) —E{S}| > €E{Sq}) < n 
We see that for (2.1.8) to be true, it suffices that 
Var{S()} = 0((E{Sq}]*). (2.1.9) 
(For the little o notation, see Appendix, Section 4.1.) This is a very mild condition. 


EXAMPLE 1. Let time be discrete, and Sq) = S: =X +...4+%, where the X’s (claims 
at separate time moments) are iid. Then E{S/,)} = mt and Var{S,} = 0°t, where m = 
E{X;} and 0° = Var{X;}. If m > 0, then E{S(} + ce, and (2.1.9) is also true. 


EXAMPLE 2. Let Sq) be a homogeneous compound Poisson process. In the notation of 
Section 4.3, 
E{ S} = mt, Var{Sq} = (0 +m*)At. 


So, again both requirements, E{Sw} — oo and (2.1.9), are true. 


EXAMPLE 3. If S(p is a non-homogeneous compound Poisson process, using the nota- 
tion of Section 4.3, we write 


E{Sq}=myx(t), Var{Sq} = (0 +m*)x(2), 


and (2.1.6) is true if x(t), the mean number of claims by time t, converges to infinity. 


For the reader who did not skip Chapter 5, or who is familiar with the notion of Brownian 
motion, consider 


EXAMPLE 4. Let S(,, = ut + Ow;, a Brownian motion with drift. It is natural to assume 
u > 0. Then E{Sy)} = ut > œ, Var{Sy)} = o*t, and (2.1.9) again holds. 


The reader can suggest other examples. In particular, it is not necessary for the X;’s above 
to be identically distributed. 

Thus, we adopt the assumption (2.1.6) and turn to the main result. 

Let T = T, be the moment of ruin; see also Fig.2. Formally, t = min{t : R, < 0}. If the 
process R; never assumes a negative value (no ruin occurs), we indicate this writing T = œ. 

AS was repeatedly noted in Chapters 4 and 5, the r.v. t may be defective; that is, it may 
happen that P(t, < œ) < 1 and t, = œ with a positive probability. Moreover, this should be 
the case when we are modeling real surplus processes because P(t, < œ) equals the ruin 
probability y(u), and we want this probability to be small. 

Below we omit the index u in T,. 


348 6. GLOBAL CHARACTERISTICS OF THE SURPLUS PROCESS 


Theorem 2 Let (2.1.6) be true, and y > 0 be the adjustment coefficient defined above. 

Then 

exp) 
Efexp{—yR:}|7 <=) 


y(u) (2.1.10) 


So, the theorem gives a precise expression for the ruin probability. The denominator in 
(2.1.10) looks somewhat complicated, though, as we will see in Section 2.5, there are cases 
when it may be easily computed. 

Lundberg’s inequality follows from (2.1.10) immediately. Indeed, by definition, R < 0 
and y > 0. Hence, exp{—yR:} > 1. So, the denominator in (2.1.10) is not less than one, 
which implies (2.1.4). 

We prove Theorem 2 in Section 2.6 using a martingale technique. For the reader who 
does not plan yet to learn martingales, we have given above a direct proof of Lundberg’s 
inequality. 


2.2 Computing adjustment coefficients 


In the first subsection below, we consider conditions under which equation (2.1.2) has a 
positive solution. This has a rather mathematical significance because 


If in a particular problem we manage to find an adjustment 
coefficient y > 0, then we can be sure that it is unique and we can 
use it applying either Lundberg’s inequality or Theorem 2. 


This is true because, as any m.g.f., Ma(z) is convex. Therefore, if Ma(z) Æ 1, then—as 
has been already noted— Ma(z) may equal 1 only at two points. The first point is zero, and 
the other is y. See more details below. 

So, the reader who is interested rather in applications may skip Subsection 2.2.1 and 
move directly to Subsection 2.2.2 that deals with concrete models. 


Route 1 => page 351 


2.2.1 A general proposition 


Consider a r.v. & whose m.g.f. M (z) exists and is finite for all z € [0,z0), where 0 < zo < ©. 
We assume that zo is the largest number with this property; that is, M(z) = for z > zo. 

To clarify this, let us recall that if, for example, € is uniformly distributed on some in- 
terval, M (z) exists for all z and hence zo = œ. On the other hand, if € is exponential, then 
M(z) exists only for z < 1/u, where u = E{&}. So, zo = 1/y. (See Section 0.4.3.) 

As for the point zo itself, M(zo) may or may not exist. In Example 1 below, we consider 
the case when M(zo) is finite. However, for the exponential distribution, M(z) does not 
exist at zo = 1 /u. 

We do not consider M (z) at negative z’s. 

Note also that, in general, we do not have to assume u = E{&} to be finite. Since there 
exists a positive z for which M(z) < œ, for z > 0 we can write u = tE{z6} < 1E{e%} = 


2. Ruin Models 349 


iM (z) < œ. However, we do not exclude the case where u = —%, that is, the negative part 
of X has an infinite expectation. In this case, the reasoning below remains correct. 
Consider the equation 
M(z)=1, z>0. (2.2.1) 


We assume that 
P(§=0)F1, (2.2.2) 
since otherwise M(z) = 1, and equation (2.2.1) is trivial. 


Proposition 3 Jf (2.2.2) holds, then the following is true. 


(a) A positive solution to (2.2.1) exists if and only if u < 0 and 


M(zo) >= 1 (2.2.3) 


(b) Ifu <0 but (2.2.3) does not hold, then M(z) <1 for all positive z for which M(z) 
exists. 


(c) If the equation (2.2.1) has a positive solution, then this is the only positive solu- 
tion, and (2.2.3) is true. 


Before proving this proposition, we clarify the sense of its conditions. In Section 2.1.1, 
in particular in Fig.3, we have already shown the role played by u. Consider now condition 
(2.2.3). 

First of all, if M(zo) = œ, whatever zo is, finite or infinite, then (2.2.3) holds automati- 
cally. In particular, this is the case when zy = œ and 


P(§>0)>0. (2.2.4) 
Indeed, (2.2.4) implies that there exists b > 0 such that 
P(§>b) >0. (2.2.5) 


(If P(§ > b) =0 for all b > 0, we can write that P(§ > 0) = limp.0,,50P(§ > b) = 0, 
which would contradict (2.2.4).) Hence, denoting by F (x) the d.f. of €, we can write 


M(z) = f dF (x) > | dF (x) > e? i dF (x) = e*P(E>b) 30. 
On the other hand, if P(E > 0) =0, then in view of (2.2.2), M(z) = E{e®} < 1 for all z > 0. 
(Indeed, eS <1 witha positive probability, and P(e% > 1) = 0.) So, in this case, there is 
no solution to (2.2.1), and condition (2.2.3) also does not hold. This situation has been 
illustrated in Fig.3c. 

Now, let zo < œ. Again, if M (zo) = œ, the condition (2.2.3) holds automatically. For 
example, let € = X — c where X may be interpreted as a claim and is exponential, and c is 
viewed as a premium. Then M(z) = DEE — œ as z —> 1/E{X}. The situation is 
illustrated in Fig.4a. 

However, it may happen that M (zo) < œ, while M (z) = for all z > zo. 


350 6. GLOBAL CHARACTERISTICS OF THE SURPLUS PROCESS 


EXAMPLE 1. Let & = X —c where c > 0 and a positive r.v. X has the density 


K 


= =x > 
f(x) fae 42 0; 


and K is a constant for which fọ f(x)dx = 1. Then 


M(z) See) ee 
0 


The last integral diverges (or equals infinity) for z > 1, and it is finite for z < 1. Hence, in 
our case, zo = 1. On the other hand, 


e “dx = Ke” Jan 
0 


ae 
14x fee 


< 1 


Thus, M (z) is defined at z < 1. 
Let us explore other features of this particular case. The integral in (2.2.6) equals 
arctan(cc) = 7/2, so we have 


T 
M(1) =Kie © where Kı =K,. 


Now observe that Kı > 1 because Kj = K fY rH > K Jo eih = So f(x) = 1. 

Hence, M(1) > 1 for sufficiently small c. The situation is illustrated in Fig.4b. We see 
that in this case, a solution p to equation (2.2.1) exists. 

On the other hand, for sufficiently large c, we have M(1) < 1; see Fig.4c. In this case, a 
solution to (2.2.1) does not exist. However, since M(z) = œ for z > 1 (so to say, at z= 1 
the function M(z) jumps over 1, we may view 1 as a solution. 


Not going deeply into it, note that in the situations similar to that in Example 1, zo may 
be considered a solution to (2.2.1) and used in the context of Lundberg’s inequality. 

We proceed to a formal 

Proof of Proposition 3. First, note that since P(€ 4 0) > 0, the second derivative 


M" (z) = E{€7e%} > 0. (2.2.7) 


Mz) 


FIGURE 4. 


2. Ruin Models 351 


(For the sake of rigor, note that M” (0) may be infinite, which does not contradict (2.2.7), 
but M” (z) is finite for 0 < z < zp. Indeed, the integral over (—%,0) is finite because the 
function x?e™ is bounded there (x’s are negative, z is positive). The integral over [0,00) is 
bounded because M (z) is bounded for 0 < z < zo. See also Section 0.4.4.) 

Now, let u > 0. Then M’(0) > 0, and by virtue of (2.2.7), M(z) is strictly increasing. 
Hence, (2.2.1) does not have a solution; see also Fig.3a. 

Consider the case u < 0 and M (zo) < 1 where zo is either finite or infinite. We show that 
in this case M(z) < 1 for all z > 0, and there is no solution to (2.2.1) (or as was told above, 
a solution p = ©; see Fig.3c). Indeed, assume that there exists zı < zo such that M (z1) > 1. 
Since M'(0) < 0, the function M(z) < 1 in a neighborhood of 0, and there exists z2 such 
that 0 < z2 < zı and M(z2) < 1. So, M(0) = 1, M(z2) < 1,M(z1) > 1,M(zo) < 1 for 
0 <z2 < zı < Zo, which contradicts the convexity of M (z). 

Now, let u < 0 and M(zo) > 1. Let z2 be the same as above: M(z2) < 1. If M(zo) > 1, 
then the existence of a solution follows immediately from the continuity of M(z). 

Let M(zo) = 1. Then zo cannot be infinite. Indeed, if zo = œ, then M(0) = 1,M (z2) < 1, 
and lim,_,..0M@(z) = 1, which contradicts to the convexity of M (z). 

But if M (zo) = 1 and zo < ©, then zo is a solution to (2.2.1). 

Now, assume that there exists a positive solution p. Because M (z) is convex, and M” (z) > 
0, the graph of M(z) may intersect any line only in two points. Hence, M(z) = 1 at only 
two points: zero and p. Consequently, p is a unique positive solution. Then M (zo) is larger 
than 1 or equal to one, and in the latter case the solution p is zo. W 


We proceed to particular cases. 


2.2.2 The discrete time case: Examples 


Lett =0,1,..., and W; = Yı +...+Y;, where Y; is the claim surplus during the ith period. 
We assume the Y’s to be i.i.d. 

As a rule, Y; = X; — c, where X; and c are the aggregate claim and premium, respectively, 
corresponding to the ith period. Below, we omit the adjective “aggregate” if it cannot cause 
misunderstanding. 

Formally, we should solve the equation (2.1.2) for all intervals A = (k,k +t], where 
k,t are integers. However, since the Y’s are identically distributed, it suffices to consider 
A = (0,f], and accordingly the r.v. W,. 


Because the Y’s are independent, the m.g.f. Ma(z) = Mo (2) =E {exp{zW; } } = (My (z))/, 


where My (z) is the common m.g.f. of the r.v.’s Y;. Thus, Ma(z) = 1 if and only if 
My(z) = 1. (2.2.8) 


This is an equation for the adjustment coefficient Yy. 
We assume that 


E{Y} <0, and (2.2.9) 
P(Y >0)>0 (2.2.10) 


(the index i is omitted). 


352 6. GLOBAL CHARACTERISTICS OF THE SURPLUS PROCESS 


So, we are in the situation of Fig.3b. To show this absolutely rigorously, one may ap- 
ply Proposition 3, but as a matter of fact, we should not much worry about conditions of 
this proposition. As has been noted in the beginning of Section 2.2 (and as was stated in 
Proposition 3 itself), if we manage to find a positive solution to (2.2.8), this solution will 
be unique and all conditions of Proposition 3 will be true automatically. 

Let now Y; = X; — c. Then E{Y;} =m — c, where m = E{X;}. The condition E{Y} < 0 is 
equivalent to the condition 

c>m, 


and condition (2.2.10)—to the condition P(X; > c) > 0 or 
P(X; <c) <1. 


Both conditions are natural; regarding the latter, note that nobody will pay a premium that 
is larger than or equal to the future payment with probability one. 
Now, note that equation (2.2.8) may be rewritten as 


e-“My(z) =1, (2.2.11) 


where Mx (z) is the common m.g.f. of X’s. (See (0.4.1.5).) 
In examples below, when considering a separate claim X;, we omit the index i. 


EXAMPLE 1. Assume that the claim X has a T-density, say, f(x) =xe~*. Then E{X} = 
2, the m.g.f. My (z) = 1/(1 — z)? for z < 1. Substituting this into (2.2.11), we come to 


e = =(1—z)y. (2.2.12) 


However, we should remember that for z > 1, the m.g.f. My (z) does not exist; so we should 
accept only solutions z < 1. In Exercise 5a, the reader is suggested to graph the r.-h.s. and 
the l.-h.s. of (2.2.12). 

It is impossible to write a solution to (2.2.12) explicitly, but it is easy to solve it numeri- 
cally, even using a graphing calculator. For example, for c = 2.15, we will readily find that 
for a solution y, we have 0.136 < y < 0.137. Thus, 


y(u) < exp{—yu} < exp{—0.136u}. 
(To have a correct inequality, we should take the lower (!) bound for y.) 


EXAMPLE 2. Let X be well approximated by a (m,o?)-normal distribution. Then 
Mx (z) = exp{mz +0?z?/2}, and equation (2.2.11) may be rewritten as 


—cz+mz+ (0*z"/2) =0. 


The positive root is 


(2.2.13) 


2. Ruin Models 353 


The particular expression (2.2.13) gives an idea of the following approximation in the 
general case. Taking the logarithm of both sides of (2.2.11), we can rewrite it as 


—cz +lnMy (z) =0. (2.2.14) 


Assume that the root of the equation is “small”. In (0.4.5.7), we have derived the approx- 
imation formula 


1 
InMx (z) = mz + aoe +o(z’), 


where m = E{X}, o? = Var{X}, and o(z?) is a remainder negligible with respect to z? for 
small z’s. So, 
—cz+mz + (67z7/2) +0(z*) =0. (2.2.15) 


If we neglect the term o(z?), the positive solution to (2.2.15) will lead us to the approxi- 
mation 
2(c—m) 
o ` 
The accuracy of such an approximation can be estimated with use of bounds for the 
remainder in Taylor’s expansion, but we restrict ourselves to an example. 


yx (2.2.16) 


EXAMPLE 3. For the data from Example 1, o? = 2, and (2.2.16) gives 


_ 2:0.15 
=> 
which is not so bad in comparison with the answer from Example 1 (~ 0.136). 


Let now c = 2.05. Then, as is easy to compute, 0.05 < y < 0.051, while (2.2.16) gives 
0.05. 


= 0.15, 


2.2.3 The discrete time case: The adjustment coefficient for a group of 
insured units. 


It is natural to view the loss (or claim) r.v. X; above as the aggregate claim coming from 
a portfolio of insured units. Here we present it in an explicit way. Let 


Xi = Xn +... + Xin;, (2.2.17) 


where n; is the number of units composing the portfolio in the ith period, and X;; is the 
payment to the jth unit in the same period. We assume {X;;} to be iid. rv.’s. Note that 
in this case, X;’s are independent, but perhaps are not identically distributed because the 
number of terms in the sum (2.2.17) depends on i. 

Denote by c the premium for a particular unit, and set Y;; = X;; — c, the claim surplus 
of the jth unit in the ith period. Since time is discrete, for any time interval A = (t,t +k], 


where ft, k are integers, 
t+k t+k ni 


Wa = Ls De (2.2.18) 


iSt i=t+1 j=1 


and the m.g.f. Ma (z) = (My(z))* where My (z) is the common m.g.f. of r.v.’s Y;j, and s = 
M+it...+m+x, the total number of all terms in (2.2.18). 


354 6. GLOBAL CHARACTERISTICS OF THE SURPLUS PROCESS 


Thus, M4 (z) = 1 iff My (z) = 1, and we have come to a nice conclusion: 


The adjustment coefficient for a homogeneous portfolio is equal to 
the adjustment coefficient for one separate insured unit. 


(2.2.19) 


EXAMPLE 4. When considering the above model, it is natural to assume that the loss 
for a particular unit is equal to zero with a positive and “substantial” probability. Let, say, 
Xj; = 0, 10, 20 with probabilities 0.8, 0.1, 0.1, respectively. Then E{X;;} = 3. Suppose 
c= 35. 

To compute the adjustment coefficient y, we do not need any information about the num- 
bers of units, and the equation for y is the same equation (2.2.11) where X corresponds to 
Xij, and c should be replaced by c. Using version (2.2.14), we have 


3.5z+1n(0.8 +0.1e!™ 4+ 0.1e7%) =0. 


An analytical solution is again impossible, but it is easy to solve the equation numerically. 
The reader can verify that the positive solution y = 0.022. 


However, the situation is different if we consider the portfolio as a whole, and set each 
Xi = Gi +... + &ix,, (2.2.20) 


where now &;; is the amount of the jth claim in the ith period, and the r.v. K; is the number 
of claims in this period. It makes sense to emphasize that, while X;;’s above were payments 
to separate units (and could take on zero values), €;; are claims arriving at the system (and 
may be assumed to be positive). Suppose that all r.v.’s are independent, &’s are identically 
distributed, and the same is true for K;’s. 

If we know the distributions of &’s and K;, we can find, at least theoretically, the distri- 
bution of X, and hence the adjustment coefficient. For example, if we accept the approx- 
imation (2.2.16), we can set there m = mE{K;} and o? = 0°E{K;} + m’Var{K;}, where 
m and ©? are the mean and variance of &’s, respectively. A particular example is given in 
Exercise 9. The case where K; are Poisson r.v.’s is the most interesting, but it is convenient 
for us to consider it later, in Section 2.2.5, after we explore the case of the Poisson process 
in continuous time. 


2.2.4 The case of a homogeneous compound Poisson process 


We turn to continuous time and consider the claim surplus process W, = S(;) — cr, where 
N; 

S =} X; (2.2.21) 
i=1 


N; is a Poisson process with intensity A, and X; are i.i.d. and do not depend on N;. (See also 
Section 4.3.) We assume X’s to be positive. 


2. Ruin Models 355 


Mz) M(z) 


FIGURE 5. 


The reader remembers that E{N,} = At, and E{S,} = màt, where m = E{X;}. Denote by 
Nx, Sa the increments of the corresponding processes over an interval A = (t,t +8]. Then 
E{Nx} = Að, and E{Sa} = m6. 

The point is that the r.v. Sa is also a compound Poisson r.v. Indeed, we can write that 

N48 
Sa =S SE YX: 
i=N,+1 


The process N; is a process with independent increments, and at any point f “everything 
starts over as from the beginning”. So, the r.v. Sa has the same distribution as the r.v. 


Na 
Sa=} Xi. 
i=l 
In particular, the m.g.f. 


Ms,(z) = exp {A8 (Mx (z) — 1)}, 


where Mx (z) is the m.g.f. of X’s (see (3.1.6) in Proposition 3.3). 
Set the premium 
Cr = (1 +0)E{S)} = (1 +0)mìt, 


where again O > 0 is a security loading coefficient. Then the increment of the premium 
over the interval A is ca = (1 +0)màð, and the m.g.f. of Wa is 


Mw, (z) = exp{—caz}Ms, (z) = exp {—(1 +0)mAðz} exp {A6 (Mx (z) — 1) } 
= exp {AS [Mx (z) — 1 — (1+ 0)mz]}. 


Thus, My, (z) = 1 iff Mx(z) — 1 — (1 +0)mz = 0, which we write as 
My(z) = 1+ (1+0)mz, (2.2.22) 


or 
Mx (z) =1+cz, (2.2.23) 


where c = (1 + 0)m, the premium per one claim. 


356 6. GLOBAL CHARACTERISTICS OF THE SURPLUS PROCESS 


(a) The case of a "small" 0. (b) The case of a "large" 0. 


FIGURE 6. 


This is an equation for the adjustment coefficient. 

It is noteworthy that this equation does not involve À and 4, so the adjustment coefficient 
is specified only by the distribution of X (compare with (2.2.19)). 

Consider (2.2.23) and/or (2.2.22) setting for simplicity M(z) = Mx (z). Since X is posi- 
tive, and M(z) is convex, M'(z) > M'(0) =m > 0. Hence, M(z) is strictly increasing, and 
if M(z) is defined on [0,%), then M(z) — œ% as z — æ. Note also that while M'(0) = m, the 
slope of the line specified by the r.-h.s. of (2.2.23) is c > m. See also Fig.5a. Hence, in this 
case a solution to (2.2.23) or (2.2.22) exists and is unique. 

If M(z) is defined on a finite interval [0,z0), zo < œ, but M(z) — œ as z — zo, we have 
the same; see also Fig.5b. 

Consider the last case where M(z) is defined on [0, zo], zo < œ, and M (zo) < œ. Then a 
solution exists for all sufficiently small O > 0. Indeed, the closer 0 is to zero, the closer the 
line 1 + (1 +6)mz is to the line tangent to M(z) at the origin; see Fig.6a. So, for small 8, 
the line 1 + (1 + 6)mz intersects the graph of M(z). 

> The formal proof may run as follows. Since X is positive and e* > 1+x+ 5x? for 
all x > 0, the m.g.f. M(z) =E{e*} > E{1+2X + 52°X"} = 1 +mz + 52°E{X’}. Because 
E{X*} >0and1+(1+6)mz— 1 +mz as 0 = 0, for sufficiently small 6, we have M (zo) > 
1+mz + 5E{X7} > 1+(1+6)mzo. < 

For large 0, a positive solution may not exist. In this case, we may set Y = Zo. 

Indeed, let us first choose O for which y exists and equals zg. It would be the limiting 
case when M (zo) = 1+(1+8)mzo. For such a 0, Lundberg’s bound for the ruin probability 
is e ™ =e". But for a larger 8, the ruin probability will be even smaller (the positive 
component of the surplus process gets larger). Hence, the same bound is true for such 8’s 
also. 


EXAMPLE 1. Let X have a I-distribution with parameters a,v. Then m = v/a, and 
(2.2.22) amounts to (1 — z/a)~Y = 1 +z(1 +0)v/a, or 


(1 + z(1+8) a TA (2.2.24) 


where we should consider only z < a, since for z > a the m.g.f. does not exist. 
Solving (2.2.24) numerically does not cause any difficulty; for v = 1,2 one can write an 
explicit solution. If v = 1, i.e., X is exponentially distributed, after simple algebra, (2.2.24) 


2. Ruin Models 357 


:(o- 2°) = 0. 


o ad 
aes) 
For v = 2, since one root of (2.2.24) is z = 0, the equation may be reduced to a quadratic 


equation, and we should choose the positive root which is less than a. See also Exercise 
11. 


may be rewritten as 


The positive root is 


(2.2.25) 


Using expansion (0.4.5.3), we can get a counterpart of approximation (2.2.16), writing 
(2.2.23) as 1+mz+ imz? +0(z2?) = 1 +cz, where m = E{X?}, the second moment of X. 
Neglecting the term o(z), we come to the approximation 

jle 
yx Ao (2.2.26) 
m2 

EXAMPLE 2. In the situation of Example 1, m = A 
(1 +6)m, approximation (2.2.26) gives 


m = % +(%)*, and since c = 


= 20a 

lee 
For v = 1, the precise y in (2.2.25) differs from approximation (2.2.27) by the multiplier 
which for small 8 is close to one. See another example in Exercise 14. 


y (2.2.27) 


1 
1+6” 
2.2.5 The discrete time case revisited 


We continue to consider the compound Poisson model of the previous section, but now 
we will refer to ruin only as the event when the current surplus becomes negative at the end 
of a unit interval; say, at the end of a year. Formally, it means that the no-ruin (survival) 
probability is defined as (u) = P(R, > 0, t = 1,2,...). Time is still continuous, but we 
check for ruin only at integer moments of time. Here, we follow a tradition and mark the 
no-ruin probability by a tilde to distinguish this probability from 6(u) = P(R, > 0 for all 
t > 0). Let (wu) = 1—6(u), and as usual, y(u) = 1 — 0(u). 

Clearly, 

Pu) < lw). (2.2.28) 
(If ruin occurs at an integer time moment, then ruin has occurred at some moment.) 

To find w(u), we apply the results of Section 2.2.2. Let us mark X’s from this section by 
a tilde to distinguish them from the individual claims X in this section. More precisely, let 
fork = 1,2,..., 

Ne 
kian Sa. 
i=M1+1 
the total claim for the unit time period (k—1,k]. Since N, is a homogeneous Poisson 
process, at the beginning of each period, the claim process starts to run as from the very 
beginning, all r.v. X; are iid., and 


E{X,} = mE{Ny—Ny_-1} = mÀ. 


358 6. GLOBAL CHARACTERISTICS OF THE SURPLUS PROCESS 


Then the premium per unit time is c = (1+ 0)E{X,} = (1 +0)mÀ. The rv. X; has a 
compound Poisson distribution, and 


Mz (z) = exp{A(Mx(z) — 1)}- 
Hence in our case, the equation (2.2.11) may be written as 
exp{—(1+0)mAz} exp{A(Mx(z) — 1)} = 1, 
which is equivalent to A(Mx (z) — 1) — (1+ 8)mdAz = 0, or 
Mx(z) = 1 + (1 +0)mz. 


Thus, we have arrived at the same equation (2.2.22). This means that the adjustment coef- 
ficient in the discrete time case (for the scheme we consider) is the same as in the case of 
continuous time. So, the bounds e~™ for y(u) and (u) will be the same. 

Does this contradict (2.2.28)? No, since we deal with upper bounds. To compare y(u) 
and w(u) we should also take into account the denominator E{exp{—yR,}|t < œ} in 
(2.1.10) which is different for these two cases. The r.v. (—Rr) = |R;| is the deficit at the 
moment of ruin. If we consider only integer moments of time, R, may become negative 
before the end of the period of ruin, and |R+| corresponds to the deficit accumulated during 
this period, whereas in the general case, |R,| is the deficit corresponding to the claim at the 
moment of ruin. So, we should expect that, on the average, |R,| is larger in the discrete 
time case. 

Certainly, this is not a proof; the proof itself follows from (2.2.28) since if the numerator 
in (2.1.10) is the same for both cases, the denominator must take on different values in the 
continuous and discrete cases. 


2.2.6 The case of non-homogeneous compound Poisson processes 


We presuppose that the reader is familiar with the notion of a non-homogeneous Poisson 
process considered in Chapter 4. 

Since the equation for y in the previous section does not depend on A, one may suppose 
that, as a matter of fact, the homogeneity of the process N; is not necessary, and we can get 
the same result in the general case. This is indeed true. 

Let A(t) be the intensity function for N,. As in Section 4.3, we set 


TE Í ere l Nore 


for any interval A. As was shown in Chapter 4, E{N,} = x(t), E{Na} = Xa, and hence 
E{Sa} = mya. The r.v. Sa is a compound Poisson r.v., and its m.g.f. 


Ms, (z) = exp {Xa (Mx (z) — 1)}- 
We set again the premium 


qo=(1 + O)E{S i} = (1+60)myx(t). 


2. Ruin Models 359 


Then the increment of the premium over an interval A is ca = (1 +0)mya, and the m.g.f. 
of Wa is 


Mw, (z) = exp{—caz}Ms, (z) = exp {—(1 + @)myaz} exp {xa (Mx(z) — 1)} 
= exp {Xa (Mx (z) — 1 — (1 + 0)mz)}. 


Thus, Mw, (z) = 1 iff My (z) — 1 — (1 +8)mz = 0, which again leads to (2.2.22). So, the 
equation does not involve %4, and the adjustment coefficient is again specified only by the 
distribution of X. 


2.3 Finding an initial surplus 


This section is to emphasize that having an estimate of the ruin probability, we are able to 
estimate the initial surplus sufficient to keep the ruin probability less than a given desirable 
level. Denote this level by a. Say, if we choose œ = 0.05, then we accept a 5% risk of 
being ruined. 

We proceed from the bound 

y(u) < exp{—w}, (2.3.1) 


and set exp{—yu} = a. Then, 
1 
u= Tan (2.3.2) 


where s = In(1/a). If a < 1, then s = In(1/a) > 0. From (2.3.1) it follows that for such 
a choice of the initial surplus u, the ruin probability does not exceed the security level a. 
The estimate s/y is obtained with some leeway because we proceed not from the real ruin 
probability but from its upper bound. In any case, 


For the ruin probability to be less than q, it suffices 


1 
that the initial surplus u > ne = : 32) 


EXAMPLE 1. Let us revisit Example 2.2.2-1, where we found that 0.136 < y < 0.137. If 
we estimate u proceeding from the inequality u > s/y, then we should take the lower bound 
for y, that is, 0.136. Indeed, if u > aaa then we have a right to write that 

“> 


S 
> 2 
"one y 


which implies that the ruin probability is less than a. In its turn, vis < 7.36. So, we take 
u > 7.36s, and it will be an estimate with some leeway. 
For example, for & = 0.05, we have s = In(1/.05) = In20 < 2.996 < 3, and we come to 


u > 3-7.36 = 22.08. 
Route 1 => page 363 


360 6. GLOBAL CHARACTERISTICS OF THE SURPLUS PROCESS 


However, there is another way to keep the ruin probability lower than a given level: to 
increase the premium. To a certain degree, both characteristics—the initial surplus and the 
premium—are under the control of the insurer, and the determination of them consists in a 
trade-off between these characteristics. We consider it in the next section. 


2.4 Trade-off between the premium and initial surplus 


For the discrete time case, by (2.2.14), 
1 
c= qe: (2.4.1) 


In the compound Poisson case, (2.2.23) implies 


1 
c=; (Mx) -1). (2.4.2) 


(Note that these representations do not differ much for small y. Indeed, My (y) — 1 is 
small for y close to 0 (say why). Since In(1 +x) = x+ (x) for small x’s, we can write 
InMx(y) = In{1 + (Mx(y) — 1)] = Mx(y) — 1 + o(Mx(y) — 1), where the second term is 
small.) 


Our goal is to find c and u for which the ruin probability is not greater than a given 
security level a. Since we have only an upper bound for the ruin probability, we have to 
proceed from this upper bound. For the upper bound in (2.3.1) to equal &, we should have 


e ™ =Q, or 
1 1 KY 
—-In{—)=-. 2.4.3 
Y “in(5) ( ) 


Now, if we replace y in (2.4.1) and (2.4.2) by the r-h.s. of (2.4.3), we will establish a 

relation between u and c, which ensures the ruin probability to be not larger than a. (As we 

understand, with leeway since we proceeded from an upper bound for the ruin probability.) 
Thus, in the discrete time case, we have 


c= *InMy (-) (2.4.4) 
s u 
and in the compound Poisson process case, 
gat (Mx (Žž) = 1) (2.4.5) 
s u 


If, say, in the compound Poisson case, c is not equal to but larger than the r.-h.s. of 
(2.4.5), then the bound e~™ will be smaller than a. From a common-sense point of view, it 
is understandable. If the premium is larger than required, then the ruin probability will be 
even smaller than we wish. 

Formally, it follows from the fact that Mx (y) is convex, and hence the r.-h.s. of (2.4.2) is 
increasing in y. (Graph M(z) and consider the slope of the line connecting the point (0, 1) 
and (z,M(z)). We skip formalities here.) 

The same is true for the discrete time case. Advice on how to show the monotonicity of 
the r.-h.s. of (2.4.1) is given in Exercise 12. 


2. Ruin Models 361 


EXAMPLE 2. Suppose time is discrete, and X is (m,o7)-normal. In this case, InMy(z) = 
mz+67z* /2, and the reader can readily get from (2.4.4) that 


2 
1 
Ca A (2.4.6) 
2 u 
The same may also be obtained by substituting y in (2.4.3) . 


by the explicit expression for y in (2.2.13). 


The curve (2.4.6) is a hyperbola. Its graph is the boundary 
of the area depicted in Fig.7. For all points (u,c) above the m 


curve (2.4.6) in Fig.7, the ruin probability is less than a. 


l FIGURE 7. 
We see also that c > m as u — œ. This has a natural in- 


terpretation. If the initial surplus is large, then the security 
loading may be small, i.e., close to zero. This means that the premium per claim may be 
close to the expected loss per claim. The last quantity is E{X } = m. 


Let, say, m = 10 and o? = 4. To make our formulas nicer, we choose as œ not 0.05 or 
0.01, but, say, e~t = 0.01832. Then s = 4, which will make calculations simpler. 


(The choice of & is rather subjective anyway. If, for example, œ = 0.05 seems proper, 
one may choose & = e > ~ 0.049787, which is very close to 0.05.) 


For u = 8, the premium c should be equal to 11, that is, the relative loading is 10%. If 
it is too much, but we should keep the ruin probability at the same level, then we should 
increase the initial surplus. For example, for the 5% loading, c = 10.5 and (2.4.6) leads to 
u= 16. 


Two more things are noteworthy. First, the fact that c — o as u — 0 in (2.4.6) should 
not mislead us. This does not reflect the real situation but rather the circumstance that we 
are dealing with an estimate which is not accurate for small u. Certainly, even if the initial 
surplus equals zero, for any c > m, there will be no ruin with some positive probability. If 
the premium c is large (but not infinitely large), the ruin probability should be small. So, if 
u = 0, we do not need c to be infinitely large for the ruin probability to be smaller than the 
level a. 


On the other hand, for large u, (2.4.6) gives a good approximation not only for the normal 
case but for practically arbitrary X’s. 


Indeed, a large u leads to a small y; see (2.4.3). As we saw in Section 2.2.2, in this 
case, one can use the approximation (2.2.16). This leads to (2.4.6) as an approximation. 
The larger u is, the smaller y is, and hence the better the accuracy of this approximation. 
Certainly, this argument is heuristic, and for rigorous estimation we should quantitatively 
evaluate the accuracy of the approximation. 


Now, let us consider an example dealing with continuous time. 


362 6. GLOBAL CHARACTERISTICS OF THE SURPLUS PROCESS 


EXAMPLE 3. Consider the compound Poisson process 
with X’s uniformly distributed on [0,1]. To make the exam- 
ple illustrative, set œ = e74 ~ 0.018 (see reasoning on this 
point in Example 2). Then s = 4. As we know, to evaluate 
the adjustment coefficient, we do not need any information 
about the intensity of the Poisson process. 

1/2 In our case My (z) = (e* — 1)/z, and (2.4.5) amounts to 


u u? 4 4 


The graph of this function is the border of the area depicted in Fig.8. All points above 
the border correspond to the ruin probabilities that are less than a. Note that c + 1/2 as 
u— œ. (Set x =4/u. Then c = 4 (e*—1—x) + 4 as x — 0. The last fact may be proved 
by L’ Hôpital’s rule.) 

As we already noted in Example 2, the convergence mentioned is not surprising. If the 
initial surplus is large, the premium per claim may be close to the expected loss per claim. 
In our case, this is E{X } = 1/2. 

As to the fact that c — œ as u — 0, the corresponding remark from Example 2 applies to 
this case too. 


> In conclusion, we mention an approximation concerning the case of the homogeneous 
compound Poisson process and a small security loading. Let c = (1 + @)m as in Section 
2.2.4, and m = E{X;}. Then 


1 2 
y(u) TF0 exp (i Onn } >0 as 0 — 0, uniformly in u > 0. (2.4.7) 


This approximation is obtained in [69], and the accuracy of the approximation—in [69] 
and [12, Section 5.3]. Some refinements of (2.4.7) may also be found in [12, Section 5.3]. 

The main point in (2.4.7) is that the approximation is true for all u, including u depending 
on 9. As above, let s = In(1/a), and 


m 1+9 m 1 
2m 0 Im 6’ 
for small O. Then, by (2.4.7), for small 9, 


(2.4.8) 


u=s 


~~ 1l ~ 
y(u) ~ 170% xa. 
Hence, (2.4.8) represents the trade-off between u and 0 for small 6 and the security level a. 
Certainly, since we consider small 9, the initial surplus u is large. 
Note also that (2.4.8) does not contradict what we got before. Since My (z) = 1 +mz+ 
m2(z?/2) + 0(z”) for small z [see (0.4.5.2)], from (2.4.5) we get that for large u 


m 1 
=(1 = — =). 
c=(1+0)m=m+s 4of1) 


2. Ruin Models 363 


From the last relation it follows that 


1 
ee +o G) ; 
2um u 
which is consistent with (2.4.8) for small @ or (which is equivalent) for large u. « 


2.5 Three cases where the ruin probability may be computed 
precisely 


As was repeatedly noted, so far we have dealt only with estimates of the ruin probability. 
Now we will consider cases when the denominator in the main result (2.1.10) may be 
computed explicitly. 


2.5.1 The case with an exponentially distributed claim size 


Let us consider the model R, = u + ct — S(,); time may be discrete or continuous. Suppose 
the claims X’s are exponentially distributed, E{X} =m. Denote by Tt the ruin time. The 
value of the process R; at time T is the r.v. Rz. 

At the moment T (if it occurs), the process R; makes a jump down and crosses zero level. 
So, R+ < 0. Certainly, the jump may occur only if at this moment a sufficiently large claim 
arrived. Denote by R;_o the value of the process before the jump, and by X the size of 
the jump, which is equal to the claim at the moment t. The tilde indicates that this is not 
a usual claim but the claim at the moment of ruin. Then R, = Rz_9 -X= -(X — R0), 
where X — Rz—ọ is the overshoot. See also Fig.9. Denote the overshoot x — Ryo by D. 
Clearly, D = |R;|; see again Fig.9. 

Suppose the r.v. R;_9 assumed a value r. Of course, the distribution of the claim X at the 
moment T depends on r, since for ruin to occur the claim X must be larger than r. So, the 
distribution of X is equal to the conditional distribution of the exponential r.v. X, given that 
X >r. However, in view of the memoryless property, the overshoot D = X —r does not 
depend on r, and has the same distribution as each claim X. That is, D has the exponential 
distribution with the same parameter a = 1 /m, where as usual m = E{X;}. (See also Section 
2.2.1.1.) 

On the other hand, since D = |R;|, the denominator in (2.1.10) is 


Exexp{—YRr} |T < œ} = Efexp{y|Re|} |T <} = E{exp{YD} |T < %}. 


X (the whole drop) 
The moment of ruin 


~— The overshoot 


FIGURE 9. 


364 6. GLOBAL CHARACTERISTICS OF THE SURPLUS PROCESS 


As we saw, the r.v. D does not depend on T, and hence 
1 
E{exp{yD}|t < eo} = E{exp{yD}} = Mp(y) = fay 
Thus, E{exp{—yRz}|t < œ} = 1/(1 — my), and by (2.1.10). 


y(u) = (1 — my) expt yy. (2.5.1) 
In the discrete time case, in accordance with (2.2.11), yis the solution to the equation 
e 2 = 1- mz. (2.5.2) 


An explicit formula for the solution does not exist, but it is easy to solve such an equation 
numerically (see also Exercise 17). 
For the compound Poisson process, (2.2.25) implies 


0 
= —____ 2.5.3 
Y m(1 +90)’ ( 
which together with (2.5.1) gives an explicit formula: 
1 Ou 
= ; 2.5.4 
wu) ee aa} oo 


Note also that above reasoning makes sense rather for the continuous time case. In the 
discrete time situation, we usually view X; as an aggregate claim in the ith period, that is, 
X; is the sum of r.v.’s, and it would be non-realistic to assume that the distribution of X; is 
exponential. 


2.5.2 The case of the simple random walk 


It is useful to check that Theorem 2 in this case leads to the result of Section 4.4.3.2. In 
the model of Section 4.4.3.2, we deal not with claims but with increments of the surplus 
process. Accordingly, the claim surplus process in our case is W; = Yı +...+Y;, where the 
increment of the total claim surplus at the moment i is Y; = —1 or 1, with probabilities 
p and q = 1 — p, respectively. (Y; indicates a loss, so when Y; = —1, the surplus process 
moves up.) We assume p > 1/2. 

Let u be an integer. Since the process R, each time jumps up or down exactly by one unit, 
at the moment of ruin (if any), R = —1, and hence E{exp{—yR,}|t < œ} = exp{y}. 

Then, by (2.1.10), 


= — „—Yyu+1) 
y(u) = w l (2.5.5) 


On the other hand, in this case, equation (2.2.8) amounts to 
pe *+qe =1. (2.5.6) 


Setting e = x, we rewrite (2.5.6) as p +qx? = x. There are two solutions to this equation: 
x = 1 and x = p/q. Since we are looking for a positive z, we should choose the latter 
solution. Hence y= In(p/q). Substituting it into (2.5.5) we have 


y(u) =(¢q/p) ar, (2.5.7) 


2. Ruin Models 365 


where, as in Section 4.4.3.2, r = (1—p)/p. 

The difference between (2.5.7) and the formula y(u) = r“ in Section 4.4.3.2 is explained 
by the fact that in Section 4.4.3.2.2 we defined the ruin time as the moment when the process 
first reaches zero level, while here the ruin time is the moment when the process takes on a 
negative value. In the framework of Section 4.4.3.2, the latter definition corresponds to the 
replacement of the initial capital u by u + 1, which leads to (2.5.5). 


Route 1 = page 377 


2.5.3 The case of Brownian motion 


As in Section 5.2.4.4, let R; = u + ut + ow,;, where w, is a standard Brownian motion, 
and u > 0. Since R, is now a continuous process, it will not overshoot zero level, but first 
will hit (touch) it. In this case, it is reasonable to redefine the notion of ruin setting 


t= min{t > 0: R, =O}. (2.5.8) 


All results above continue to be true in this case. 

> As follows from the remark at the end of Section 5.1.1.3, once the process reaches 
zero level at time 7, in any arbitrary small interval (t,t + 6] the process will take negative 
values infinitely many times, rapidly oscillating around zero for a while. So, the previous 
definition min{t > 0: R; < 0} is not proper: the minimum does not exist, and we should 
write inf{t > 0: R, < 0}. In view of the continuity of the process, the latter definition is 
equivalent to (2.5.8). < 

Thus, in our case, R = 0. Hence, the denominator E{exp{—yR,}|t < co} = 1, and 
y(u) = exp{—yu}. 

To find y, we first realize that in our case W; = —ut — ow,, and hence, Wa = —u|A|— owg, 
where |A| is the length of the interval A. Note that —ow, is a normal r.v. with zero mean 
and variance 0?|A|. (Multiplication by —1 does not change the distribution of a symmetric 
rv.) Then Ma(z) = exp{—p|A|z + 07|A|z?/2} = exp{—|A|z(u—67z/2)}, and the equation 
(2.1.2) is equivalent to the equation y(u—o7y/2) = 0. A unique positive solution is 


Y= 2p/o°. 
Consequently, 


y(u) = exp{ —2ju/0?}, 


which coincides with (5.2.4.9). 
The precise formula for yr(u) = P(t < T) was obtained in Section 5.2.4.5. 


2.6 The martingale approach and a generalization of Theorem 2 


We consider now the ruin problem in the martingale framework, presupposing that the 
reader is familiar with the basic notions of Section 5.2. To the author’s knowledge, the first 
use of martingales in Actuarial Modeling is due to H. Gerber (see, e.g., [41], [42]) and F. 
DeVylder [34]. 


366 6. GLOBAL CHARACTERISTICS OF THE SURPLUS PROCESS 


The main goal of this section is not to prove Theorem 2, although we will do that. Rather, 
it is to show that this theorem is not a tricky analytical fact but a direct and almost obvious 
corollary from the martingale stopping property. 

Assume that all processes under consideration are functions of an original process €; as 
it was defined in Section 5.2.2. As such a process, we can take the surplus process R; itself, 
but it is more convenient to define the original process separately. 

As in Section 5.2.2, we denote by &' the collection {€,; 0 <u < t}, i.e., the whole history 
of the process until time f. 

In the framework of this section, the surplus process R; is a rather general process. In 
particular, we will not assume that it is a process with independent increments. However, 
let us first look at what will happen when this condition holds. 

It is convenient to define the independence of increments in terms of €’. More specifi- 
cally, assume for a while that 


For any ¢ and any interval A = (t,t + s], the r.v. Ra does not depend on €'. (2.6.1) 


(Here, as in Section 5.2.2, Ra = Rt+s — Ry.) 

Since R, is completely determined by €, from (2.6.1) it follows that Ra does not depend 
on R,. Vice versa, if we take R, as the original process (and we can do that), then property 
(2.6.1) will follow from the independence of increments of R;. 

As in Section 2.1.3, let W; = u—R,, the claim surplus process. Note that condition (2.1.6) 
is equivalent to 


W, 2 —o as to, (2.6.2) 


If property (2.6.1) holds for R;, it holds for W, also. Then, for any z, t,s and interval A = 
(t,t+5], 


Efe Mts! 6) = Eee E} = e Ey Wa E) = pM Es e} — eM), (2.6.3) 


where we denote by M4 (z) the m.g.f. of Wa. 

Thus, if Ma (z) = 1, then E {exp{zW,.;}|&} =exp{zW,}. As we know, there may be only 
one positive solution y (if any) to the equation M,(z) = 1. For y so defined, let Y, = e™. 
Then E{Y,,;|&'} = Y,, and hence, 


The process Y, =e’ is a martingale. (2.6.4) 


As a matter of fact, (2.6.4) together with (2.6.2) is the only thing we need. So, we may 
weaken condition (2.6.1), adopting (2.6.4) itself as the original condition. As we saw, if 
(2.6.1) is true, then (2.6.4) is also true, but certainly the process Y, may be a martingale 
while increments of R, are dependent. 


Thus, regarding the claim surplus process W,, we eventually assume that 
(a) Wo =0; 
(b) condition (2.6.2) holds; 


(c) there exists a number y > 0 for which (2.6.4) is true. 


2. Ruin Models 367 


Our next step is to apply the martingale stopping property (5.2.4.4). If we had had a right 
to do that, we would have written 


1=E{e°}=E{e™\—Ff¥)\—E{Y,}=Efexp{y(u—R,)}}=e™E {exp{—yRr}}. (2.6.5) 


Then it would have remained to recall that t, and hence R+, is an improper (or defective) 
r.v., that is, P(t < œ) < 1. If ruin does not occur, then we can say that t = œ. Thus, we can 
write—at a somewhat heuristic level — that 


Exexp{—YRr}}=E {exp{—yRr} | Tce }P(T<0) +E {exp{—YRe} | To }P(t=e). (2.6.6) 


In view of (2.6.2), R, =u—W, P +oo as t — œ. Hence, if tT = œ, we can set (again reasoning 
a bit heuristically) that Rų = œ, and hence E{exp{—yR:} |T = œ } = 0. 
Thus, from (2.6.5) and (2.6.6) it follows that 


1 =e"E{exp{—YyRı} |T < œ}P(T <0), 


and we come to the basic result (2.1.10): 
e 
E{exp{—yRx}|t <} 
However, the problem is that since t is improper and, consequently, is not a stopping 
time, we cannot apply the martingale stopping property directly. This obstacle, however, 
may be easily overcome if we apply a sort of truncation and use condition (2.6.2), which 
we do in the proof below. 

So, we state and prove the following theorem. 


P(t <) (2.6.7) 


Theorem 4 Let the above conditions (a)-(c) hold. Then (2.6.7) is true. 


Proof. Let a fixed T > 0, and tr = min{7,t}. Since T is fixed, tr is bounded, and 
Condition 1 of Theorem 5.6 holds. Applying this theorem, we have 
1 = E{Yo} = E {Yu } = E{exp{ War }} 
= E{exp{ YW, } |t< T}P(t< T)+E{exp{yw,,}|t > T}P(t>T) 
= Efexp{yw,} |t< TSP(t<T)+E{exp{yWr}|t > T}P(t> T) 
= Efexp{y(u—R,} |t< T}P(t< T)+E{exp{yWr}|t>T}P(t>T). (2.6.8) 
Let T — œ. The first term in (2.6.8) is 
E{exp{y(u—Rr)}|t< T}P(t< T) — e“E{exp{—yRz}|t< co} P(t< 0). (2.6.9) 
It remains to prove that the second term in (2.6.8) vanishes as T — œ. Note that by the 
definition of qt, if T < q, then Rr > 0, and Wr = u — Rr < u. So, fixing a number k > 0, we 
get that the second term in (2.6.8) is equal to 
E{exp{yWr}|t > T}P(t>T) 
= E{exp{yWr}|t>T, Wr > —k}P(Wr > —k|t > T)P(t >T) 
+ Efexp{yWr}|t>T, Wr < —k}P(Wr < —k|t>T)P(t>T) 
< exp{yu}P(Wr > —k, t > T) +exp{—yk}P(Wr < —k, t >T) 
< exp{yu}P(Wr > —k) +exp{—yk}. (2.6.10) 


368 6. GLOBAL CHARACTERISTICS OF THE SURPLUS PROCESS 


Let T — æ. Condition (2.6.2), by definition, means that P(Wr > —k) > 0, as T > ©, 
for any fixed positive k > 0. Hence, from (2.6.10) it follows that 


jim E{exp{yWr}|t>T}P(t>T) <e% 
—oo 


for any k. The term on the left does not depend on k, so we can let k — œ, which for y > 0 
implies that the limit on the left is zero. W 


2.7 The renewal approach 


An essentially different approach of this subsection proves to be also quite efficient. 


2.7.1 The first surplus below the initial level 


In this subsection, we consider the compound Poisson process case, assuming that R; = 
u +ct —S(,), the process S is defined in (2.2.21), the process N, is a homogeneous Poisson 
process with a constant intensity A, and c = (1 + 6)mA, where m = E{X;} > 0. When 
considering a separate X;, we will omit the index i. 

Since it does not make sense to consider claims equal to zero, we assume also that 


P(X >0)=1. (2.7.1) 


It will be convenient for us to indicate the dependence of the ruin time on u explicitly, so 
we set Tu = min{t : R; <0|Ro = u}. 

Let q be the probability that the process will ever fall below the initial level u. It may 
happen if and only if the process ct — S(;) falls below zero level. Hence, q does not depend 
on u, and equals (0), the ruin probability in the case when the initial surplus equals zero. 

For the same reason, the size of the drop below the level u at the moment when the 
process first crosses this level does not depend on u either. The distribution of the size of 
the drop mentioned coincides with the distribution of |R,,|, the absolute value of the deficit 
of the surplus at the ruin time if the process starts from zero. 

It proves that q and the distribution of |R,,| may be represented in a simple form. Let 
F (x) be the d.f. of X’s. 


Theorem 5 For any x > 0, 


1 Xx 
PCy <= Rl <3) = Ege [G-Foy)ay. (2.7.2) 


We prove (2.7.2) in Section 2.7.4, and now we discuss several interesting corollaries from 
this theorem. 
First, setting x = c in (2.7.2), we get that 


Pm <=) =W0) = Grae [I FO) = Ty 2.73) 


+ 


by virtue of the formula (0.2.2.2). Recall also that q = y(0). 


2. Ruin Models 369 


L,=u -Y; 


Lo= u -Y1 - Y2 
Lg =u-Y4-Y2 -Y3 
0 


FIGURE 10. A realization of a renewal process. Y’s are the drops below the 
corresponding levels; in particular, Yı is the drop below the level u. 


The formula (2.7.3) is very interesting—the ruin probability for u = 0 depends only on 
the security loading and does not depend on the distribution of X’s at all. 

Now note that, since To is an improper r.v., i.e., P(to < œ) = y(0) < 1, the overshoot 
|Rr,| G.e., the deficit at the moment of ruin) is also improper: it is defined only in the case 
To < œ. Let us consider the conditional distribution of |R,,| given tT < œ, more precisely 
the conditional d.f. Fı (x) = P(|Ra| < x| To < œ). 

From (2.7.2)-(2.7.3) it follows that 


P(t <0, [Ral SY 


Fi) = = 5 f'U- FO). 


P(t < oo) 


The conditional density equals 


Fea = “(1 _ F(z). (2.7.4) 


2.7.2 The renewal approximation 


Let us return to R,. Starting from level u, with probability q = 1/(1 +9), the process at 
some time will drop below the initial level u. If it happens, the size of the drop below the 
level u will be a r.v. Yı having the above d.f. F; (x). 

The process N, is homogeneous, and the time between consecutive drops have the lack 
of memory property. Consequently, after the drop mentioned, the process will start to run 
as if it is at the beginning, with the exception that now the starting position is L4 = u — Y4. 
See Fig.10. 

Since the next drop cannot occur immediately after the first drop, starting from L4, the 
process will be moving up for a while. Hence, L; is a local minimum of the process. 

The process will fall below the new level Lı with the same probability q. If it happens, 
the size of the new drop below the level L; will be a r.v. Yọ which will not depend on Y,, 
and will have the same distribution F4. The value of the process at the moment when it falls 
below the level L4, is L2 = u — Yı — Y2. See again Fig.10. 

Continuing to reason in the same fashion, we define the r.v. Y„ as the size of the nth drop 
below the previous (n — 1)th level, and the r.v. Ln = u — Yı — ... — Yn. The r.v. Ly is the nth 
local minimum of R;. 


370 6. GLOBAL CHARACTERISTICS OF THE SURPLUS PROCESS 


The process L, is called a renewal process, and values of L, —record values; see Fig.10. 

Since the probability of falling below the current value, that is, q, is less than one, the 
sequence of record values, or drops, is not infinite, but will run up to the moment when the 
process leaves the lowest level, and will never fall below it. Denote by K the total number 
of record values, not counting u. Then P(K = n) = pq", where p = 1—q, so K has a 
geometric distribution. Then the lowest level of the process R; is 


K 
L=mink, = Lg = u — Zg, where Zk = Ł Yz, 
t 
k=1 


and the r.v.’s Y; are independent and have the common d.f. Fi. 
If K =0, that is, the process never falls below the initial level, then we set Zg = 0, and 


L = u. It occurs with probability p = 1 — q. 
It is easy to understand that the no-ruin probability 


o(u) = P(L > 0) = P(Zx < u), (2.7.5) 


and we have come to a familiar object: the distribution of the sum of a random number of 
independent r.v.’s. 
The ruin probability y(u) = 1 — (u). 
In accordance with (3.3.1.2) and (2.7.3), 
1 


= n xn 0 
os F?” (u), where p= ~~ and q= 7r (2.7.6) 


We can apply to (2.7.6) methods of Section 3.3.1. 


EXAMPLE 1. Let X’s take on values 1 or 2 with probabilities 0.75 and 0.25, respectively. 
The reader is invited to verify that in this case, m = 1.25, and the density 


1 
file) = = (1— F(x) = 
and equals 0 otherwise. This means that f(x) = 0.891(x) +0.2g2(x), where gi(x) and 
g2(x) are the densities of the uniform distributions on [0, 1] and [1,2], respectively. In other 
words, fı is a mixture of uniform distributions; see also Exercise 18. 
Let © = 0.2. Then q = 15 = 2. For an integer k, the part of (2.7.6) corresponding 
to summation } g; does not exceed q‘*!; see Section 3.3.1.1 for detail. For example, 
q°® < 0.009, and if we are satisfied with such an accuracy, we can restrict ourselves to x 
Numerical estimation of such a sum is not a very complicated problem. Denoting by 
G1, G2 the corresponding uniform d.f.’s, for the convolution Fý” we can write 


n 


Fi” = (0.86; +0.2G2)" = $ 6 


) (0.8)(0.2)" *Gi* + Gy", 
k=0 


see (2.2.1.1). There exist explicit, though cumbersome, formulas for convolutions of uni- 
form distributions, so with good software one should not have a problem in calculations. 


2. Ruin Models 371 


In accordance with (2.7.5), (u) is the d.f. of Zg. Next, we compute the m.g.f. of Zg or, 
equivalently, that of its d.f. ọ(u). By (3.1.5) and/or (3.3.1.16), 


= a Zu Zà P 
M2) = 1 erdot) = ey (2.7.7) 


where My (z) is the m.g.f. of the r.v.’s Y;. Using (2.7.4) and integrating by parts, we get that 
for all z > 0, 


My(2) =— fe (—F(x))dx= (1 F(0)) + = fea F(x) = = Mx) 1, 


mz mz Jo 


because, in view of (2.7.1), F (0) = 0. Inserting this into (2.7.7), and substituting values for 
p and q, it is easy to calculate that 


Omz 
Mo(z) = : 2.7.8 
olz) 1+ (1 +0)mz— Mx(z) 
If for a particular X, the m.g.f. (2.7.8) is familiar for us, we can determine (u). 
For some cases, it is convenient to rewrite (2.7.8) as 
0 1 O(M. —1 
My(z) ! MQ) 1) (2.7.9) 


~ 140. 1+0 1+(1+8)mz—Mx(z) 


The last formula reflects the following circumstance. With probability p the rv. K = 0, 
and since in this case Zg = 0, the d.f. of Zx—that is, d(u)—makes a jump of p at zero. In 
view of (2.7.6), this may be represented in the following form: 


ou) =ptpY q'F” (u). (2.7.10) 
n=1 
The two terms in (2.7.9) correspond to the respective two terms in (2.7.10). 
EXAMPLE 2. Let F(x) be a mixture of exponential distributions, say, the tail 
2, 1 1 
F(x) =1-F(x) = 567+ le 
Then m = 5-1+5-3=2,and for z < 1/2, 


1 1 1 1 
Mx(z) =. Ea . 
MEN 5 tee Pos Ee 


To make calculations more illustrative, set © = 0.5 in our example, realizing that it is not 
very realistic. Substituting it into (2.7.9), the reader can verify that in this case, 
bor 2 23% 


M,(z) = l , 2.7.11 
0) = 343° 3 i8482 ( ) 


The equation 2 — 18z+ 18z? = 0 has two solutions: zı = 0.5 + V5/6 zx 0.87, and z2 = 
1 -Z 7% 0.13. 


372 6. GLOBAL CHARACTERISTICS OF THE SURPLUS PROCESS 


Using the method of partial fractions we write 


ee aaa ( fh 5 ). (2.7.12) 
2—18z4+ 1827 18(z1—z)(z2-z) 18 \z-z wm-z 


where c1, c2 are constants that we should find. Putting the r-h.s. of (2.7.12) into the 
common denominator, we get that c1 +c2 = 3, c1z2 + c2z) =2. 

We will write all solutions up to the second digit. Solving the equations for cı and c2, we 
readily get that cı = 0.83, c2 = 2.17. Then 


2— i : 
3z 0.05 0.95 ). (2.7.13) 


= | 
2 — 18z+ 182? 1—2/zy 1—z/z2 
where the denominators are precise but the coefficients in the numerators are computed up 


to the second digit. Together with (2.7.12), we have with the same accuracy that 


1 2/ 0.05 0.95 

Mo(z) = 54 aE: . 
Homa sess ea) 
The last term is a mixture of exponential m.g.f.’s. Consequently, 


nee ; 2? ; (0.05F,, (u) +0.95F,, (u)), 


where F; stands for the exponential d.f. with parameter z. Eventually, 


tosis ; (0.05F, (u) +0.95F.,(u)) 


2 
= (0.05 exp{—ziu} +0.95 exp{—z2u}) ~ 0.03 exp{—0.87u} + 0.63 exp{—0.13u}. 


Route 2 => page 377 


2.7.3 The Cramér-Lundberg approximation 


In conclusion, we present without a proof one more celebrated result of Risk Theory. 


Theorem 6 Let y > 0 be the adjustment coefficient satisfying (2.2.22). Then, 
y(u) ~Ce™ as u —> o, (2.7.14) 


where 
m8 


= Me) —m(1 +8) 


(2.7.15) 


Proofs may be found, e.g., in [10] or [50]. 

To clarify the significance of the last formula, assume that X is exponential. Then 
My(z) = 1/(1 — mz), and M{(z) = m/(1 —mz)?. By (2.5.3), y = 0/[m(1 +0)]. Substi- 
tuting it into (2.7.15), we readily get C = 1/(1+ 0), which is consistent with the precise 
formula (2.5.4). 


2. Ruin Models 373 


Another example is given in Exercise 23. 
The reader may find further interesting approximations for the ruin probability and ruin 
time as well as further references, e.g., in [12] and [44]. 


2.7.4 Proof of Theorem 5 from Section 2.7.1 


Usually, this theorem is proved with the use of differential equations. Below, we mainly 
follow a different proof from [10] by S. Asmussen. In part, we do it for diversity, but also 
because the latter proof is direct and illustrative. 

So, we consider the case when Ro = 0. Set T = To, and for a set A from the real line 
denote by Z4 (x) the indicator A—that is, I4 (x) = 1 if x € A and Z4 (x) = 0 otherwise. Let 


m= f I4(R;)dt. (2.7.16) 


Since 74(R;) = 1 or 0, depending whether R, got into A or not, the r.v. na is equal to the 
amount of time the process R; spent in A before the time moment T, that is, before ruin. 
Our proof is based on 


Lemma 7 For any bounded set A, 
1 
E{na} = Š IA], 
where |A| is the length of A (and c = (1 +0)mìÀ, the premium rate). 
We will prove this in the end of the section. From Lemma 7 and (2.7.16) it follows that 
T 
A= Etna] = cE { f Is (Rar ; (2.7.17) 
0 


Next, we show that (2.7.17) implies that for any bounded function g(y) defined and 


integrable on [0,), 
f g(v)dv=cE (ff e(R ar} . (2.7.18) 
0 0 


If g(v) = (v), then (2.7.18) coincides with (2.7.17). Consider a piecewise constant 
function 


g(v) =) gida), (2.7.19) 
k 


where g1,22,.-. are numbers, and A1,A2, ... are disjoint sets. The function g(v) takes on the 
constant value g% if v E€ Ag. Since sets Ag are disjoint, 


f soa = } gilAkl. (2.7.20) 
k 


By (2.7.17), 


f soa = L selan = Lsick { [tuterar} 


=cE [f Feia) =cE {f erar) ; 


374 6. GLOBAL CHARACTERISTICS OF THE SURPLUS PROCESS 


X (the whole drop) 
t+at 


~— The overshoot 


FIGURE 11. 


which proves (2.7.18) for any function of the type (2.7.19). Since any bounded function 
may be approximated with any desired accuracy by a piecewise constant function, (2.7.18) 
is true for any bounded g(y). 

Having (2.7.18), we can turn to the direct proof of Theorem 5. Our reasoning is close to 
what we already did in Section 2.5.1. 

The process R, jumps down during an infinitesimally small interval [t,t + dt] only if N, 
jumps up (a claim arrives). In accordance with (4.2.2.1), the probability that this happens 
equals Adt. Denote by R,—o the value of the process before this jump, and by X the size 
of the jump (that is, the claim). We omit an index in X. Since dt is infinitesimally small, 
we may identify the value of the process after the jump with R;, so R; = R;-0 — X; see also 
Fig.11. 

Consider the event £,,(dr) consisting in the following: 


(i) During an interval (t,t + dt] ruin occurred. 
(ii) It occurred at the first time, and hence t < T and R;—ọ > 0. 


(iti) The overshoot |R;,| has exceeded some level x > 0. 


Then P(£,,(dt)) = P(t < t, X > R,—o +x)Adt. 

Let I(E) stand for the indicator of an event £, that is, I(E) = 1 if E occurs, and = 0 
otherwise. (In the function 74 (x) above, A is a set from the real line, while & is an event in 
the original space Q of elementary outcomes.) 

Using the formula for total expectation (0.7.2.1), we can write that P(£,,(dt)) =E{I(t < 


TIX > Ryo +x) }Adt =E fit < t)E{I(X > Ryo +x) | Ryo, I(t < ay} Adt =E {I(t < 1) 
P (x > R90 +x| Ryo, I(t < 1)) \ Adt. Given R,;_9 and t < q, the conditional probability 
P (x > Ro +x| Ry-0, I(t < 1)) is the probability that the amount of a claim will be larger 
than R;—o +x. Then, setting F (x) = P(X; > x), where X; is a claim, we have 

P(E,x(dt)) = E {I(t < t)F(R,-o +x) }àdt. (2.7.21) 


For a fixed ż, the probability that a jump occurs at time t is zero. Consequently, for a fixed 
t, the distributions of the r.v.’s R;-o and R; are the same. Then we can replace R;—o by R; in 
the right member of (2.7.21). Thus, 


P(Ejx(dt)) = E{F(R, + x)M(t < 1) }adt. 


2. Ruin Models 375 


FIGURE 12. 


Summing up the probabilities P(‘L,,(dt)), or more precisely, integrating in t, we have 


Pazajkj>a= f PE) = a [EF R +x) (t <0) Jat 


=ef [F(R +t < oar} =E f [F(R +s}. 


Consecutively applying (2.7.18), the fact that c = (1 +0)mA, and the variable change y = 
x+v, we get that 


P(t <,|Rz| > x) = i Fet- aa, F(x+v)dv 


1 7 
= atom L F(y)dy. (2.7.22) 


Setting x = 0, and recalling that m = [ F (y)dy by virtue of (0.2.2.2), we write 


1 
P œ) = P œ, |R 0)= v)d R 2.7.23 
(T< œ) = P(T < œ, |R| > 0) Tola DEET ( ) 
To get (2.7.2), it remains to subtract (2.7.22) from (2.7.23). 
To complete the proof, we should provide 


Proof of Lemma 7. Let us fix, for a while, t > 0 and consider for s € [0,1] the process 
R,= Ry — R,—s. The process R, may be interpreted as R, in reversed time. Note that Ro = =0, 
and R, = = R, because Rọ = 0. The process R, moves up linearly with the same slope c, and 
drops down with the same intensity À as R,. The distribution of jumps is also the same as 
for Rs, and the only difference is that, if Rẹ has a jump at a point sı, the corresponding jump 
of R, occurs at time t — s1; see also Fig.12. 

However, the last fact has no effect on the distribution of trajectories of R,, since the 
intensity of jumps does not depend on time, and jumps are equally likely to occur at any 
time. Thus, the distribution of the process R, is the same as for Rs, that is, the probability 
of any collection of possible trajectories of R, equals the same probability for Ry. 

On the other hand, R, = Rp= R, +R;+_5. So, R, > R, for all s < t if and only if R;—s > 0 
for all s < t. This is the same as R, > 0 for all s < t . Thus, for any set A, the event 


{R, € A,t < T} = {R; €A, Rs > 0 for all s < t} = {R, € A, R, > R, forall s < t}. 


376 6. GLOBAL CHARACTERISTICS OF THE SURPLUS PROCESS 


Toss he oe 


The time spent in A at the moments of leadership. 


FIGURE 13. “Thick” segments indicate moments of leadership. 


(The last step is true because R, = R,.) Since the processes R, and R, have the same distri- 
bution, this implies, in turn, that 


P(R, €A,t < 1) = P(R, € A, R, > R,for all s < t) = P(R, € A, R, > Ry for all s < t). 
(2.7.24) 
Note also that 74 (R,) = I(R, € A), by definition of I(E). Then, by (2.7.24), for a bounded 
set A, 


E{ma} =ef fua) -Ef fie, EAN (t < svar} -Ef fie, EA t< svar} 
=f E{IR E€A,t<t)}dt = [PR E€A,t < t)dt 


=i PR EA,R,> Res <il =E] | IR, CAR > RS <1)dr b. (2.7.25) 
0 0 


Since A is bounded, there exists M such that A C [0,M]. Denote the last integral in 
(2.7.25) by J. This is the total time when œR; is in A, being at the same time the largest 
value with respect to all previous moments s < t. We can also call such t’s moments of 
leadership. 

If Rr > M at some moment T, then “in the future”, for t > T, at a leadership moment the 
value of the process will be larger than M, and hence will not be in A. 

Consequently, J is exactly (!) equal to |A|/c, the length of A divided by the slope of R, at 
points of growth—see Fig.13. Note also that J < (|A|/c) <M /c in any case. It remains to 


use the condition R, Z, œ, Let T > 0. The last expected value in (2.7.25) equals 


E{J} =E{J|Rr > M}P(Rr > M)+E{J|Rr < M}P(Rr < M) 
= (|A|/c)P(Rr > M) +E{J|Rr < M}P(Rr < M). (2.7.26) 


Let T — ce. The first term in (2.7.26) converges to |A|/c, since P(Rr > M) — 1 for any M. 
The second term does not exceed (|A|/c)P(Rr <M) >0as T > œ. W 


2. Ruin Models 377 


2.8 Some recurrent relations and computational aspects 


Here, we briefly discuss how to compute ruin probabilities for finite time horizons using 
recursive methods. The relations we consider are based on the first step analysis. We restrict 
ourselves to the case R, =u+c;—S OE where u is the initial surplus, Sio) is the loss process, 
and c; is the aggregate amount of (positive) cash collected by time t. It will be convenient 
for us to consider the no-ruin probability 67 (u). The ruin probability yr (u) = 1 — ọr (u). 

We start with a particular problem which requires only common sense. 


EXAMPLE 1 ({153, N2]!). BIB is a new insurer writing homeowners policies. You 
are given: (a) Initial surplus = $15; (b) Number of insured homes = 3; (c) Premium per 
home = $10; (d) Premiums are paid at the start of each year; (e) Size of each claim = $40; 
(£) Claims are paid immediately; (g) There are no expenses; (h) There is no investment 
income. 

Each homeowner files at most one claim per year. The probability that a given home- 
owner files a claim in year 1 is 20%, and in year 2, it is 10%. Claims are independent. 
Calculate the probability that BIB has positive surplus at the end of year 2. 

The insurer will not have a positive income at the end if there is ruin in the middle of the 
period, so we are computing the no-ruin probability. At the beginning, the insurer has $45, 
and in order to not be ruined in the first stage, there should not be more than one claim of 
$40. 

If there is no claim in the first period, then the insurer will have $75 at the beginning of 
the second period, and to have a positive cash, the insurer must not have more than one 
claim. If there is one claim in the beginning, the insurer will have just $35 at the beginning 
of the second period. In this case, there will be no ruin only if there is no claim in this 
period. 

The number of claims during each period has a binomial distribution, so the no-ruin 
probability 


$2(15) = (0.8)3 joo + () (09)7(0.1) if a (0.8)2(0.2) - (0.9) = 0.7776. 


Now, we present the same logic in a more formal way. First, let time be discrete, and 
S = $ =X +...+%X, where X; is the size of the jth claim, and X’s are i.i.d. r.v.’s. For 
ruin not to happen during time interval [0,7], the first claim X; should not exceed u + c1, 
and starting from the new level u +c; — X14, the process should not take on negative values 
during time T — 1. We can unify both cases: Xı < u +cı and X; > u+ c1, setting by 
definition the no-ruin probability y(u) = 0, if u < 0. Then, given Xj, the conditional no- 
ruin probability during the last T — 1 periods after the first period is Or_1(u+c1 — X1). In 
view of the independence of X’s, from this it follows that the no-ruin probability 


or(u) = E{or_1(u+c; —X1)}. (2.8.1) 


TReprinted with permission of the Casualty Actuarial Society. 


378 6. GLOBAL CHARACTERISTICS OF THE SURPLUS PROCESS 


For T = ©, setting (uv) = 0..(u), we can rewrite (2.8.1) as 


(u) = E{o(u+c, —X1)}, (2.8.2) 


which is an equation for (u). 
Consider, for example, the discrete case when X; take on values x1,x2,... with probabili- 
ties fi, fo,..., respectively. Then (2.8.1) may be written as 


or(u) =) ọr-i (u+ c1 —xj) fj. (2.8.3) 
J 


It is worth emphasizing that in the last sum, as a matter of fact, terms for which x; > u+c1, 
vanish. 
For T = œ, we may write (2.8.3) as 


olu) = Lou ter x9) fj. 
j 
The reader is invited to make sure on her/his own that when cı = 1, and X; takes on 
values 0 or 1, the last equation leads to the classical equation for the ruin probability for the 
simple random walk; see (4.4.3.10). 
For X’s taking many values and for a finite T, calculations are not so nice as they were in 
Section 4.4.3.2.2, and one should use numerical procedures. Here, we consider only simple 
examples in order to demonstrate the logic of calculations. 


EXAMPLE 2. Let the unit of time be a year, and the premium c = 4 be paid at the be- 
ginning of each year. Assume that the losses X = 2,4, 10 with probabilities fı = 0.5, fo = 
0.4, f3 = 0.1, respectively, are paid at the end of each year. Let T = 2. By (2.8.3), 


1 2 1 
b2(u) = hı (u+4-2)5 +01 (u +4 Alz +o) (u+4 10) 
1 
= jg Coie +2) +4614) + 1 (u —6)). (2.8.4) 
Here, it makes sense to consider only integer ws. We have ;(u) = 1 for u = 6,7,.... If 


u =0,..,5, a ruin may happen in one period only if the biggest claim occurs, so Q; (u) = 0.9. 
Thus, 


(u) = = (5-0.94+4-0.9+0) =0.81 foru=0,..,3; 
AE = (5-14+4-0.9+0) =0.86 forw =4,5: 
(u) = 7 (5: 1+4- 1 +0.9) = 0.99 foru = 6,...,11; 
Ol) = G4 1 o a a 


2. Ruin Models 379 


EXAMPLE 3. Consider the same problem but assume that the available surplus is in- 
vested with a risk free interest r. This means that the cash flow cı in (2.8.3) should include 
the growth of the capital, and u + c above should be replaced by (u +c)a, where æ = 1 +r. 
Then instead of (2.8.4), we should write 


palu) = E (591 (afu +4) — 2) +461 (cru +4) —4) +64 (alu +4) — 10)), 


and Q; (u) should also be recomputed. To make calculations illustrative set r = 1/9. Now 
we consider all ws, not only integers. The function g(a) = a(u +4) equals 10 for u = 5, 
so ġı (u) = 1 for u > 5. For u < 5 we have ọı (u) = 0.9 . Note also that a(u +4) —4 > 5 if 
u > 4.1, and a(u+4)—2> 5 if u > 2.3. Thus, 

1 


o2(u) = 10 (5-0.9+4-0.9+0) =0.81 for0<u< 2.3; 


1 
o2(u) = 0 (5-1+4-0.9+0) =0.86 for2.3<u<4.1; 


1 
o2(u) = jo 1+4 1+0.9) =0.99 for4.1<u<5S; 


o2(u) = 9 SO 1+4 1+1)=1 foru > 5. 


Nothing prevents us from continuing the recurrence procedure. Applying the same formula 
(2.8.3) to its interior terms we can write 


ọr(u) a trate ters) fj 


J 


= VV or-2(ut oo =x; —x) fifi (2.8.5) 
j i 


moving in the same way up to the moment when we will come to þọ(u) = 1 for u > 0, and 
= 0 for u < 0. Note that cz is the cumulative cash by time 2, and again terms inside the 
sum in (2.8.5) are equal to zero if u+c2 — x; — x; < 0. Calculations may be tedious even if 
we write a corresponding program, but the program itself should not be too complicated. 

Note also that equations (2.8.3)-(2.8.5) are the so called backward equations: we condi- 
tion the behavior of the process with respect to what may happen in the first period. Another 
approach may concern the so called forward equations. In this case, we assume that there 
was no ruin during the first T — 1 periods, and consider the behavior of the process in the 
last period. Regarding the later methods, see, e.g., [74, Section 7.3]. 


The same logic may be applied to processes in continuous time. Assume that Sy) is a 
homogeneous compound Poisson process with intensity A. Let m and F(x) be the mean 
value and the d.f., respectively, of a separate claim. As usual, we set c; = (1+ 0)mA, where 
O is a security loading. For simplicity, consider the case T = œ. 

Set n = min{t: S (t) > 0}, the moment of the first claim, and set Z = Sin); the value of the 
first claim. For the process under consideration, n is exponential with parameter A, the d.f. 
of Z is F, and ņ and Z are independent. 

Let again @(u) = 9..(u). As before, we set o(u) = 0 if u < 0. 


380 6. GLOBAL CHARACTERISTICS OF THE SURPLUS PROCESS 


At the moment n, the conditional no-ruin probability is equal to d7_y(u+ cn — Z). It is 
equal to zero if Z > cn + u. 

In view of the memoryless property, at the moment n, the process starts over from the 
new level. Since T = œ, the time horizon with respect to the new starting moment 7 is 
again infinite. Hence, 


(u) =E{o(u+en—Z)}. 


This is an equation for (uw). Since we know the distributions of Z and n, we can rewrite it 
as 


(u) =] f (u+ ct —z)dF (zje ™dt. 
o Jo 

Since o(u) = 0 for u < 0, it may be written as 

o0 N u+ct 
baz f eM ( f (utr -2)dF (2) dt. (2.8.6) 

0 0 

The theory of solutions to equations of this type is well developed and uses various 
mathematical methods; see, e.g., [10], [19], [38], [50], [74]. All these methods are not very 


simple but give, in particular, an alternative way to obtain many results we got above by 
making use of the martingale or renewal approaches. 


Routes 1 and2 = page 391 


B 3 CRITERIA CONNECTED WITH PAYING DIVIDENDS 


In the situation described in the previous section, two things may happen: 


e either during some finite time period, ruin will occur (for a large initial surplus and/or 
large premiums, the probability of this event is small), or 


e the company will avoid ruin, and the surplus R, will unlimitedly grow: R, —> œ as 
t — œ. (See also condition (2.1.6).) 


The last property is not realistic: no company will keep an excessive surplus of high 
liquidity while having an opportunity to invest a part of it or pay dividends. Moreover, a 
law and a general usual insurance practice requires paying some dividends if the surplus 
exceeds a certain level. 

On the other hand, as one may guess and as we will see below, the probability that the 
company will remain solvent forever would be zero unless the company allows the surplus 
to grow. In other words, condition (2.1.6) is essential for the ruin probability not to equal 
one. 


3. Criteria Connected with Paying Dividends 381 


To resolve these issues, we should consider, as an alternative to ruin probability, other 
quality criteria—for example, the expected discounted amount of dividends to be paid 
or/and the expected life of the company. The idea to use these criteria was first aired by B. 
De Finetti in 1957 and was considered later by K. Borch and other scholars (see, e.g., [16], 
[20], [18], [138], [141]). 

The goal of this section is to illustrate some ideas and results in this area. We restrict 
ourselves to the discrete time case and consider the model (1.3)-(1.4) from Section 1. 


3.1 A general model 


Denote by d; the dividend paid at time t = 1,2,.... The surplus process is governed by 
the relation 
R; =u+ct —S;—D,, (3.1.1) 


where S; = X, +... +X;, the claims X; are i.i.d. r.v.’s, D; = dı +... +d;, and c is a premium 
per unit interval of time. As before, when it does not cause misunderstanding, we omit the 
index i in X;. 

Assume that at the moment, if any, when R; < 0, the company stops operating. 

Let v < 1 be a discount factor. We consider an infinite time horizon, and the criterion 


E {Eval (3.1.2) 
t=1 


the expected total amount of discounted dividends to be paid. 

The variables d, represent a strategy of paying dividends. In general, since d; may depend 
on the history of the process until time f, it is a r.v. 

Since v < 1, dividends to be payed in the future are less valuable than payments now. 
However, it does not mean that the company should pay large dividends in the beginning. 
If the company pays too much in earlier stages, this will reduce the current surplus and 
will make possible an earlier ruin. In this case, the total amount of dividends may be small 
because the time of functioning will be small. 

Thus, an optimal strategy should reflect a trade-off between two issues: the desire to 
pay dividends in earlier stages, and the necessity to keep the company functioning during a 
sufficiently long period. 

We will show in Section 3.3 that under some mild conditions the optimal strategy maxi- 
mizing (3.1.2) has the following threshold structure. 


e If at the end of an underwriting period the surplus R, exceeds an optimal threshold 
level z*, then the amount R, — z* is paid out as the dividend payment during that 
period. 


e If the surplus R; is less than z*, then no dividends are paid, and the company keeps 
the surplus R; until the next underwriting period. 


In other words, 
d, = max{R, —z’, 0}. 


382 6. GLOBAL CHARACTERISTICS OF THE SURPLUS PROCESS 


We will see in Section 3.3 that the optimal level z* does not depend on the initial surplus u. 
To find the level z*, we consider the threshold strategy for all z’s and the function 


V(u,z) =E [Èva f (3.1.3) 
t21 


where 
dı: = max{R; — z,0}. 


Once we know V(u,z), we can try to find its maximizer in z, that is, the optimal level z*. 
Since this level does not depend on u, we can do that for u = 0. 

Certainly, this does not mean that the amount of dividends itself does not depend on u. 
Let us consider this in more detail. 

If the initial level u > z, then by definition of the strategy d;,, the company should imme- 
diately pay off the surplus u — z, that is, 


V(u,z) =V(z,z) foru >z. 


Let u < z. Since the goal of the company is to maximize the amount of dividends, the 
initial surplus u that the company keeps for functioning may be viewed as an investment for 
getting dividends in the future. Then the variable V (u,z) — u may be viewed as the profit of 
the company. We will prove in Section 3.3 that in the case of the optimal level z*, 


the function V (u, z*) — u is increasing in u when u < z’. (3.1.4) 


This means that the optimal behavior is to start with the initial surplus u = z* and proceed 
following the optimal threshold strategy. 

The next question concerns the ruin probability. We will see that, under the above thresh- 
old strategy, it is equal to one, provided that with a positive probability the claim may ex- 
ceed the premium. Such a condition is natural since otherwise nobody will pay such a 
premium. 

Let P(X >c+a)>6>0 for some a > 0. Let k = |[z*/a] + 1, where as usual [x] is the 
integer part of x. Then with the probability 5‘, all claim surpluses X, — c, t = 1,...,k, will 
be larger than a, and Rx will be negative. If it does not happen during the first k steps, then 
it will happen with the same positive probability during the next k steps, and so on. So, the 
probability that ruin will ever happen is one. 

(More rigorously, let 


(i+1)k 
Aj = N {X >c+a}. 
t=ik+1 
For ruin to occur, it suffices that at least one of the events A; occurs. The probability of this 
is one, since A;’s are independent and P(A;) > & > 0.) 

The fact that in the case of a threshold strategy the ruin probability equals one is not a 
reason to refuse the approach above: we nevertheless deal with the maximal amount of 
dividends. Moreover, if the time before the ruin is sufficiently large, say, larger than the 
time horizon for the company, the fact mentioned is not essential. 


3. Criteria Connected with Paying Dividends 383 


Nevertheless, we can apply a more cautious approach by introducing into consideration 
the expected time of ruin. Let D(u,z) be the mentioned expected time for the initial surplus 
u and the threshold level z. Having at its disposal both characteristics, V(u,z) and D(u,z), 
the insurer can establish a more flexible criterion. One example consists in maximizing 
V (u,z) under the restriction 

D(u,z) = Do, 


where Do is a given level determined by the preferences of the insurer. 

Analytical solutions of the problems above are complicated even in simple cases (see, 
e.g., [16], [141]). So, we restrict ourselves to one example, namely, to the simple random 
walk model. Results for this model illustrate well what we can expect in more general 
cases. As to the general situation, it is worth emphasizing that numerical solutions based 
on simulation of the process R, are quite tractable, and with use of modern software do not 
present essential difficulties. 


3.2 The case of the simple random walk 


Let c = 1, and the size of the claim at each period is the r.v. 


0 with probability p, 
2 with probability q, 


where q < p. In this case, m = E{X} < 1, and hence c > m. 

Thus, for each period, the profit of the company is c— X = +1 with probabilities p and 
q, respectively. 

Consider the threshold strategy with a level z. Assume that u and z are integers, and 
let w„(u,z) be the probability that the first dividend will be paid at the moment n. By the 
definition of the strategy we use, 


wo(u,z) =0 for u < z; wo(u,z) = 1 for u > z; 
w,(u,z) = 0 for u < 0, since in this case the insurer is ruined in the very beginning; 
w,(u,z) =0 for u >z andn > 0, since in this case the first payment occurred 
at the initial time. 
(3.2.1) 
We apply the first step approach in a way similar to what we did in Section 4.4.3.2. With 
probability p the process moves up, the surplus becomes u + 1, the random walk starts over, 
and the probability that a dividend will be paid at time n “becomes” w,_1(u+1,z). The 
same concerns the case when the process in the first step moves down. Thus, 


Wy (U,Z) = Ppwr-1(u+ 1,2) +qwn-1 (u — 1,z). (3.2.2) 


Let 


W(u,z) = Ł v"w,(u,z), (3.2.3) 
n=0 


the generating function of the sequence of probabilities {wn}. (See also Section 0.4.1.) We 
chose the same letter v € (0,1) as for discount on purpose. 


384 6. GLOBAL CHARACTERISTICS OF THE SURPLUS PROCESS 


Applying (3.2.2) for u < z, and taking into account conditions (3.2.1), we have 


oo 


W(u,z) = PPV" (pwn-1 (4,2) +qwn-1 (u — 1,z)) = pvŵ(u + 1,z)+qvŵ(u— 1,2). 


n=1 


So, for the generating function we have the equation 
W(u,z) = pyw(u+1,z) +qvw(u—1,z) (3.2.4) 


for u < z. Similarly one can get that 


W(u,z) =0 foru <0, W(u,z) = 1 for u > z. (3.2.5) 


Setting W(u,z) = r“+tl where r is a number, and inserting it into (3.2.4), we see that such 

a function satisfies (3.2.4) if 
r= pvr’ +qV. (3.2.6) 

Thus, if rı and rz are the roots of the quadratic equation (3.2.6), the functions ie and 
ake are solutions to (3.2.4). Without going too deeply into the theory, note that then any 
solution W(u,z) = ome +c ie where c1,c2 are constants. To find constants c;,c2, we 
use (3.2.5), writing w(z+1,z) = 1, #(—1,z) =0. 

Eventually it leads to the solution 


wu, z) = "2. (3.2.7) 


Next, we consider a connection between V(u,z) and w(u,z). Since u and z are integers, 
each dividend paid is equal to one. A dividend is paid if the surplus equals z+ 1, and once 
a dividend is paid, the surplus becomes equal to z. The probability that paying dividends 
starts from a moment n is wy, and after that “everything starts over” from the level z. Hence, 
for u < z, 


V(u,z) = y Wn(u,z) [v"-1+v"V(z,z)] = [1+V(z,z)] W (u,z). (3.2.8) 
n=0 


Setting u = z, we have V(z,z) = [1+ V(z,z)]W(z,z), from which it follows that 


W(z,Z) 
V(z,z) = —— 24. 3.2.9 
(z,2) eei (3.2.9) 
Combining (3.2.8) and (3.2.9), we have 
w(u,2) 
WG) = 
(4,2) 1 — W(z,z) 
Substituting (3.2.7), we get eventually that for u < z, 
u+l _ „u+l 
V(u,z) = n nh (3.2.10) 


FB) — (Fa 


3. Criteria Connected with Paying Dividends 385 


The denominator depends only on z, while the numerator—only on u. So, as was ex- 
pected, the optimal level z* does not depend on u. 

We skip detailed calculations leading to an optimal z. To find it, one should take the 
derivative of the denominator in (3.2.10), set it equal to zero, and divide the whole equation 
by 75. Then the unknown z will be contained only in the expression (rı /rz)*. Solving the 
equation with respect to this expression, one can readily get that the optimal level 


x 1 In(r2) ro(1 = r2) 

~ Ini) +]in(rs) |) ne) (1 —D) 

where rı > 1 is the larger and r2 < 1 is the smaller root of equation (3.2.6). (The values in 
(3.2.11) may be negative; in this case one should set z* = 0.) 

Table 1 shows the values of z* for different values of p, q = 1 — p, and v. These calcu- 
lations, as well as simulation of the process with X’s having different distributions, and a 
proof of (3.1.4) were provided by Sarah Borg in her master’s thesis [18]. Note that, though 
we have assumed u,z to be integers, for a more complete picture, the case of arbitrary u,z 
was considered. 

It can be seen that for each value of p, the value of z* increases as v increases. The 
higher v is, the more the company is concerned about the future payments. So, the company 
increases the level of the surplus in order to increase the time before ruin. 

We see that z* initially increases in p, and then decreases as p gets closer to 1. It is also 
understandable. Consider the extreme case p = 1. Then with probability one there will be 
no claim, and therefore the whole surplus could be paid out as dividends. So, z* in this case 
should be zero. Then, if p is close to one, we should expect z* to be small. 


TABLE 1: Values of z* for different p and v. 


(3.2.11) 


v 

p 0.90 0.92 0.94 0.96 0.98 
0.60 0.15 0.54 1.14 2.21 4.65 
0.65 0.69 1.14 1.79 2.86 4.99 
0.70 1.02 1.46 2.07 3.01 4.73 
0.75 1.17 1.57 2.10 2.89 4.27 
0.80 1.19 1.54 1.99 2.63 3.73 
0.85 1.11 1.40 1.78 2.30 3.17 
0.90 0.95 1.18 1.48 1.89 2.58 
0.95 0.68 0.85 1.07 1.38 1.88 
0.98 0.007 0.09 0.19 0.034 0.58 


Next, we briefly consider the expected life D(u,z) for the random walk model. Assume, 
as before, that u,z are integers, and p > q. 
The same first step approach leads to the equation 
D(u,z) =1+ pD(u+1,z)+qD(u—-1,z), l<u<z. 
As can be verified by direct substitution, the solution to this equation is 
ZEL z—u 
u+1 
TOO 
(p-4} |\4 q p-q 


386 6. GLOBAL CHARACTERISTICS OF THE SURPLUS PROCESS 


A general expression for D(u,z) in terms of some series and other examples may be found 
in [16]. 


3.3 Finding an optimal strategy 


In this section, we prove that the optimal strategy has properties described in Section 3.1. 
Assume that the optimal strategy exists and set 


V(u) -mef Sah (3.3.1) 
t=1 

where max is over all possible strategies {d,,d2, ... } of paying dividends, and u is an initial 

surplus. So, V (u) is the expected discounted amount of dividends under the optimal strat- 

egy. It is convenient to consider the function V (u) for all u, setting V (u) = 0 for all u < 0. 

We assume also that V (u) is continuous at all points u except perhaps u = 0. 

Consider a time momentt. If the company is still functioning, it has a surplus R = R; > 0, 
and should specify its policy for the next period. The process we consider is a Markov 
process, which means, in particular, that the policy may depend on the current surplus but 
does not depend on what strategy the company chose before time t. The company again 
faces the infinite time period, and should solve the optimization problem as if it is at the 
very beginning. 

The company receives the next premium c, and pays out the claim X = X,+; and a divi- 
dend d (which perhaps equals zero). So, the surplus R;+1 = R; +c —d — X. If, at the next 
time, the company applies the optimal strategy (which we do not know yet but assume that 
it exists), then given X, the expected discounted amount of dividends after time t + 1 will 
be V(R; +c—d—X). 

From the standpoint of the present time ż, the total amount of dividends is d +vE{V (R + 
c—d-—X)}, where R = R, and d cannot exceed the current surplus R. To find the optimal 
behavior at the period |t,t+ 1], we should maximize the last expression in d, which leads 
to the equation 


V(R) ee ae eae ee (3.3.2) 


The reader familiar with the optimization theory recognizes in the above reasoning the 
so called optimality principle, and realizes that we have derived the Bellman equation. 
Let the function 


w(y) = vE{V(y—X)} —y. 


Then 


V(R) = max [R+c+w(R+c-—d)]=R+c+ max w(R+c—d)=R+c+ max w(y), 
0<d<R 0<d<R c<y<c+R 
(3.3.3) 
where we changed variables, setting y = R+c-—d. 


Assume now that the function w(y) has a unique maximum at a point yo. This is an 
implicit condition we impose to find the optimal solution. Consider three cases. 


4. Exercises 387 


(i) yo <c. In this case, ae pV) is attained at the point y = c (graph w(y) with 
a unique maximum at yo, and place c on the right of yo). Then the optimal d = 
R+c—y=R. 

(ii) c < yo <c+R. Then pein nV) is attained at the point y = yo, and the optimal 
payment d = R +c — yo = R—z, where Z = yo —c > 0. 

(iti) yo >c+R. Then max w/(y) is attained at the point y = c +R, and the optimal 


c<y<c+R 


payment d = 0. 


Setting z* = max(0, yo —c) > 0, we see that in all three cases above, the optimal payment 


_ [JR-z* if ASS es 
T if R<. (3.3.4) 
So, the fact that the optimal strategy has the threshold structure is proved. 
To prove (3.1.4), we write (3.3.3) as 
V(u)—u=c+ max w(y). (3.3.5) 


c<y<ct+u 


If yo < c, then V (u) — u = c +w(c) and, hence, does not depend on u. It is natural—in this 
case the optimal dividend payment at the initial moment would be d = u, and the company 
will start from zero level. 

Let yo > c. If u > z* = yo — c, then as was proved, the optimal behavior consists in 
immediate payment of the surplus u — z* as a dividend. After that the process starts from 
the level z*. 

If u < z“, there should be no dividend at the initial moment, and V (u) — u = c +w(c +u). 
In this case, c-+u < yo, and hence w(c +u) is increasing in u up to the moment when 
c+u=ypo. This is equivalent to u = z*. W 


4 EXERCISES 


Sections I and 2 


1. Is wr(u), as a function of T, increasing? 


2. Consider the process R, in continuous time, and set ,(w) = P(R, > 0 for all t = 1,2,...,n), 
that is, we count only integer moments of time. Which is larger: ,(u) or ọn (u) ? 


3. Show that in general, not assuming that R; = u + ct — Si)» for (2.1.6) to be true it suffices to 
require 
E{R,}— œ and Var{R,} =o([E{R;}]*). 
4. Look over Exercises 30-36 from Chapter 2. 


5. Problems below concern Example 2.2.2-1. 


388 6. GLOBAL CHARACTERISTICS OF THE SURPLUS PROCESS 


(a) Graph the r.-h. and 1.-h. sides of (2.2.12). Show that for (2.2.12) to have a positive 
solution y< 1, the premium c should be indeed greater than 2. Check the numerical 
answers in Example 2.2.2-1. 


(b) Write a program (it suffices to provide a spreadsheet) which would allow to compute Yy 
for different c’s. Compare the results with what the approximation (2.2.16) gives. 


(c) Show that y —> 1 as c — œ. Explain why in the case of large c, (2.1.4) is not a good 
estimate for the ruin probability. (Advice: Show that the ruin probability should vanish 
when c — œ.) 


6. Assume that for some c > m there exists a positive solution y to equation (2.2.11). 


(a) Proceeding from the results of Section 2.2.1 and using Figures 4ab, show at a heuristic 
level that y — 0 as c — m from the right (being greater than m), and in this case, the 
ruin probability converges to one. Explain why it is not surprising from an economic 
point of view. 


(b) Prove that y — 0 as c > m rigorously. 


(c) Consider the case c — œ. Explain at a heuristic level that in this case, the ruin prob- 
ability should converge to zero. Show that it follows from (2.1.4). (Hint: If c is very 
large, with probability close to one, the surplus at the next step will be very large.) 


7. Assume that My (z) in (2.2.11) is defined for all z. 


(a) Assume also that a positive solution y to equation (2.2.11) exists for all c > m. Pro- 
ceeding from the results of Section 2.2.1 and using Figures 3-5, show at a heuristic 
level that y — œ as c — œ, and the ruin probability converges to zero. Explain why it 
is not surprising from an economic point of view. 


(b) Prove the result of Exercise 7a rigorously. 
(c) Let the r.v. X above be bounded by a number b. Explain why in this case the instance 


c > bhas no economic sense. Show that in this case a positive solution to (2.2.11) does 
not exist, and that y — œ as c — b if X is not degenerate. 


8. If you should solve (2.2.24) for two different a’s and the same v, would you solve (2.2.24) 
twice or just one time? 


9. Using (2.2.16), estimate the ruin probability in the situation (2.2.20) for u = 100, 8 = 0.1, &’s 
having the standard exponential distribution, and K;’s having the geometric distribution with 
parameter p = 0.1. 


10. Estimate the adjustment coefficient in the situation of Example 2.2.4-1 for v = 3, 0 = 0.1, 
and a = | and 2. (Advice: First solve Exercise 8.) 


11. Find the adjustment coefficient in the situation of Example 2.2.4-1 for v = 2. Show that for 
small 6 the answer does not contradict approximation (2.2.27). (Advice: First solve Exercise 
8, and think for what a solutions to (2.2.24) are simpler.) 


12.** Making use of the result of Exercise 1.33b, show that the r.-h.s. of (2.4.1) is non-decreasing 
in y. 

13. A flow of claims is represented by a compound Poisson process Sy, in continuous time. 

The mean time between adjacent claims is half an hour. The random value X of a particular 


claim is uniformly distributed on [0,10] (say, the unit of money is $1000). The initial surplus 
(capital) is 100, the relative loading coefficient 6 = 0.2. 


(a) Estimate the ruin probability. 


14. 


15. 


16.* 


17. 
18.* 


19.* 


20.* 


21.* 


22 


23.** 


24. 


4. Exercises 389 


(b) For which initial surplus is the ruin probability less than 0.05? 
(c) * Let u = 50. Find 0 for which the ruin probability is less than 0.05. 


For a particular group of clients, a flow of claims arriving at an insurance company may be 
represented as a compound Poisson process. Let the amount of a particular claim be equal to 
either 2, 3, or 4, with probabilities 1/4, 1/2, and 1/4, respectively. Let the mean number of 
claims the company receives per day be 10. Assume that the company chooses for its activity 
a relative loading coefficient 0 = 0.1. 


(a) Write an equation for the adjustment coefficient y. 
(b) Does this equation involve À ? 
(c) Find an approximate solution using software. Compare it with approximation (2.2.26). 


(d) Find an approximate value of the initial capital for which the ruin probability for the 
company will be less than 0.03. 


(e) Think how to answer all questions above, if a particular claim is, say, uniformly dis- 
tributed on [2,4]. Do you expect the ruin probability to be smaller? 


Think how to answer all questions in Exercise 14 if the number of claims during a day is 
exactly equal to 10, that is, we consider the discrete time scheme, and day is a unit of time. 


In the discrete time case, for the claim X in one period, we have E{X} = 3, Var{X} = 2.5. 
The required level © = 0.05. Estimate proper combinations of the initial surplus and the 
loading coefficient for “large” u. 


Provide a graph illustrating a solution to (2.5.2). 


(a) Without calculating anything, show that if X is a discrete r.v., the density (2.7.4) is a 
mixture of uniform distributions. 


(b) Find the density (2.7.4) for the case when X takes on only one value. 
(c) Find the density (2.7.4) for the case when X takes on values 1,2,3 with probabilities 
1/5,2/5, 2/5, respectively. 


Show that, if X is exponential with a parameter a, then the density (2.7.4) is exponential with 
the same parameter. Explain that this fact is non-surprising in light of what was discussed in 
Section 2.5.1. 


Clarify from a heuristic point of view the significance of the fact that the density (2.7.4) is 
decreasing. 


In the framework of Section 2.7, let the r.v.’s X; be continuous. Does it mean that the r.v. Zg 
is continuous? (Advice: Think about (0) and how it is connected with Zg.) 


In the case of the compound Poisson process, for some data the ruin probability y(u) = 
0.3e-2"+0.4e—“/2, Find @ and y. (Advice: Use the Cramér-Lundberg approximation (2.7.14) 
and (2.7.3).) 


(a) Making use of (2.2.26), show that the constant C in (2.7.15) is close to one for small 0. 


(b) Making use of (2.7.14), estimate the ruin probability for large u in the case where X 
has the I -distribution with parameters a = 1, v = 2, and 0 = 0.1. 


Realize why the values of 62(u) in Examples 2.8-2 and 3 are the same, while ranges for u are 
different. 


390 


25. 


26. 


27. 


28. 


29. 
30. 


31. 


6. GLOBAL CHARACTERISTICS OF THE SURPLUS PROCESS 


Assume that in the situations of Example 2.8-2, you are asked only to find u for which 
(u) > 0.99. Realize that in this case you can avoid most of the calculations, coming to the 
answers very quickly. Find this answer. 


For X = 2,10 with probabilities 0.6, 0.4, respectively, solve the problem of Example 2.8-2 
and the problem of Example 2.8-3 for r = 0.2. 


Show that for T < œ, the counterpart of (2.8.6) will be 


T a ee (f° or-sw+e-ar() dte. 


(Hint: The conditional no-ruin probability given n > T, is certainly one (no claim arrived 
within the period [0,7]).) 
Section 3** 


Regarding the model of Section 3.1, give a common sense explanation to why one should 
expect a higher optimal level z* for higher values of discount v. 


Can the optimal level z* be zero? (Advice: Think about the case of small v.) 


In the model of Section 3.2, let p = 0.7, v = 0.9. Using Excel or other software, provide 
the graph of V(0,z). Interpret it. Why are too small and too large values of z not optimal? 
Estimate the optimal z*. Explain why for any u < z*, the maximizer of V(u,z) will be the 
same. 


Using Excel or other software, graph z* against p for v = 0.9. 


Chapter 7 


Survival Distributions 


We begin to consider situations when the obligations assumed by the insurance company 
are connected, in one way or another, with the lifetimes of insured units. For the most part, 
we address life insurance and annuities. 

There are two main features of such insurance mechanisms. 

The first is the same as in non-life insurance and consists in the redistribution of risk 
between clients of the insurance organization. The second feature concerns the time lag 
between the moments when the company pays benefits and the time of policy issue, that is, 
the time when the first premium is paid. In this case, the random nature of the insurance 
process is specified by the probability distributions of the lifetimes of insured units. 

In the current chapter, we consider various types of such distributions and their charac- 
teristics. Chapters 8-11 concern insurance models themselves. 


1 THE PROBABILITY DISTRIBUTION OF LIFETIME 
1.1 Survival functions and force of mortality 


Most of the results below concern the lifetimes of objects of rather general nature; for 
certainty and because this is the most important for us, as a rule we will talk about people. 
For other objects (say, machines), we will use the term “failure” rather than “death”. 

Let X be the (random) lifetime, or in another terminology, the age-at-death of a particular 
individual. Set F(x) = P(X < x), the d.f. of X. We assume that F (0) = 0, which indicates 
the fact that once an individual has been born, her/his lifetime is not equal to zero. 

When talking about an individual, we view her/him as a typical representative of some 
group of people. In many applications, such a group is homogeneous; that is, the lifetimes 
of all its members have the same distribution F. However, the homogeneity property is not 
necessary. For example, the life tables for the total population of a country (see Section 
1.5 for details) contain information about the duration of life for all citizens of the country 
under consideration who constitute, of course, anon-homogeneous group. This information 
concerns the average lifetime, and we may view it of interest when a person is chosen at 
random. 

Thus, in general, the distribution F is the average distribution 1 (Fi +... + Fa), where 
n is the number of the members of the group, and F; is the lifetime distribution of its ith 
member. 

Let us come back to one individual and her/his lifetime X. In the theory under consid- 
eration, the tail of this distribution, P(X > x), is usually denoted by s(x) and is called a 


391 


392 7. SURVIVAL DISTRIBUTIONS 


survival function. Clearly, 
s(x) = P(X >x) =1-Fi(x). (1.1.1) 


Note that s(x) is a non-increasing function, and s(0) = 1 because F (0) = 0. 


EXAMPLE 1. In three different countries, typical survival functions 
s(x) =[1—x/100]* for 0< x< 100, (1.1.2) 


where & = 0.5, 1, and 2, respectively, and time x is measured in years. In which country do 
people live the longest? 

First of all, since in all cases s(100) = 0, people in these 
countries do not live more than 100 years, and consequently, 
we must set s(x) = 0 for x > 100. The graphs are given in 
Fig.l. Note that for & = 1, the distribution is uniform on 
[0, 100]. 

Certainly, for any x, the probability to survive x years is 
larger for & = 1/2 than for other a’s. For example, s(90) 
equals 0.32, 0.1, and 0.01 for & = 0.5, 1, and 2, respectively, 
FIGURE 1. so the first country is “much better” in terms of longevity. 


s(x) 


0 100 


Henceforth, we assume the d.f. F (x) to be smooth, so the distribution has density f(x) = 
F'(x). For an infinitesimal interval dx we, as usual, write that 


P(x < X <x+dx) = f(x)dx. (1.1.3) 


Consider P(x < X < x+dx| X > x), the probability that the individual under considera- 
tion will die within the interval [x,x + dx], given that she/he has survived x years. In other 
words, this is the probability that a person of age x will die within the small interval of the 
length dx after time x. In view of (1.1.3), 


P X< X P X< 
POLES eee (x <X <x+dx, >x) (x< <x+dx) _ f(x)dx 


P(X >x) P(X >x) s(x) 
(1.1.4) 
provided that s(x) 4 0. Set 
F(x) 
= 1.1.5 
@)- 5%. (1.15) 
again assuming that s(x) #0. If s(x) = 0, we set u(x) = œ by definition. 
From (1.1.4) it follows that for s(x) 40 
P(x <X <x+dx|X > x) =p(x)dx. (1.1.6) 


In the general probability theory, the function u(x) is called a hazard rate. In the case 
where the r.v. X is a lifetime, u(x) is called the force of mortality of X. 

The larger u(x), the larger the probability that a person of age x will die “soon”; i.e., 
within a small time interval. 


1. The Probability Distribution of Lifetime 393 


Since f(x) = F'(x) and F (x) = 1 — s(x), we can also write that 


igs. (1.1.7) 


Sometimes it is convenient to present (1.1.7) in the form 
@) =F nsh) (1.1.8) 
x) =——Ins(x). wl. 
fi dx 
Three examples below illustrate possible situations and are relevant to the classification 


of tails considered in Section 2.1.1. 


EXAMPLE 2. Let X be exponential with parameter a. Then s(x) =e", f(x) =ae™ 
and by (1.1.5), 


E(x) =a. 


In the lack-of-memory case, the force of mortality is constant. 


EXAMPLE 3. Now consider the case when s(x) is decreasing slower than any expo- 
nential function; for example s(x) is a power function. For instance, let X have the Pareto 
distribution (2.1.1.18) with © = 1. Then s(x) = 1/(1 +x)® for some a > 0 and all x > 0, 
and by (1.1.7), 


Thus, 


a/(1+x)ot! a 
u(x) = a S g 
1/(1 +x) 1+x 
This is not a realistic model not only for a human being but practically for any object’s life 
time since in the instance (1.1.9), the older the object, the less its chances are of dying. 


(1.1.9) 


EXAMPLE 4. Let now s(x) be decreasing faster than any exponential function, for 
instance, s(x) = e”, Then, by (1.1.5), 


a 
2xe™™ 


u(x) = 


as x — œ, which is much more realistic. 


z = 2x — œ% 
e* 


Later, we will consider other examples of the force of mortality. One of most important 
cases concerns the Gompertz-Makeham law when the force of mortality grows exponen- 
tially as u(x) = Be“ +A, where A, B, and are parameters. We consider it in more detail 
in Section 1.6. 


Now assume that the force of mortality, or in general the hazard rate, u(x) is given, 
and we want to find the survival function s(x). In this case, (1.1.7) may be considered an 
equation for s(x). Because s(0) = 1, the solution to this equation is 


s(x) opf- fuoa). (1.1.10) 


394 7. SURVIVAL DISTRIBUTIONS 


(See, e.g., the mathematically similar case in Section 0.8.1 and how we obtained formula 
(0.8.1.7) from (0.8.1.3). The difference between (1.1.7) and (0.8.1.3) is only in notation and 
interpretation. In Exercise 1, the reader is suggested to verify (1.1.10) by differentiation.) 

The representation (1.1.10) is illustrative and convenient. Above all, it may be viewed 
as a generalization of the exponential distribution. If the force of mortality is constant, say, 
u(x) equals some a > 0, then (1.1.10) implies that 


s(x) = apf- | "adz} De 


that is, we are dealing with an exponential distribution. 

In general, the integral in the exponent in (1.1.10) is a non-linear (and non-decreasing 
since u(x) > 0) function. 

It is worth emphasizing that the “survival terminology” is used only for interpretation. 
As a matter of fact, (1.1.10) is true for the distribution of any continuous positive random 
variable with hazard rate u(x). 


EXAMPLE 5 ([153, N30]!). Acme Products will offer a warranty on their products for 
x years, where x is the largest integer for which there is not more than a 1% probability of 
product failure. Acme introduces a product with a hazard function for failure at time ¢ of 
0.002r. Calculate the length of warranty that Acme will offer on this new product. 


X 
The tail probability s(x) = exp - ip 0.002a"| = exp {—0.001x7}. The solution to the 
0 


—In(0.99) \ "7? 
ae =~ 3.17. So, s(3) > 0.99, while s(4) < 0.99. 


The warranty should cover three years. 


inequality s(x) > 0.99 is x < ( 


If X is bounded, i.e., X < c for some c > 0, then s(c) = P(X > c) =0, and in this case, 
by convention, we set u(x) = œ for x > c. 


EXAMPLE 6. Let X be uniform on [0,1]. Then s(x) = 1 — x for x € [0,1], and = 0 for 
x > 1. Using (1.1.7), it is easy to check that in this case, u(x) = 1/(1 — x) for x < 1. Since 
u(x) > œ as x —> 1, it is natural to set u(x) = œ for all x > 1. 

However, it is noteworthy that if we proceed from u(x) = 1/(1 — x) so to speak not 
knowing with which distribution we are dealing, then we get s(x) = 0 for x > 1, and this 
will not depend on how we define u(x) for x > 1. Indeed, if x > 1, the r.-h.s. of (1.1.10) is 
equal to 


exp - [roa =exp - [uae fioa} =exp -f ie fira} =0, 


since te dz =, 


Next, consider the representation for the density f(x) given u(x). From (1.1.10) it follows 
that 


Fe) = -s') =ne- f"utcyach RERIN 


TReprinted with permission of the Casualty Actuarial Society. 


1. The Probability Distribution of Lifetime 395 


Compare with the formula we for the exponential density with parameter u. 


Now consider representation (1.1.10) for x = œ, defining s(%) as lim, P(X > x). We 
interpret s(cc) as the probability of “living forever”. Then P(X < oo), the probability of 
ever dying, equals 1 — s(co). If X is the lifetime of an individual, in order to be realistic, 
we should set s(cc) = 0. It follows from (1.1.10) that for this to be true, we should have 
limy co fg u(z)dz = ©, or in another notation, 


f oa = o, (1.1.12) 


Thus, for s(cc) = 0, which is equivalent to P(X < œ) = 1, the integral in (1.1.12) should 
diverge. 


EXAMPLE 7. Can a force of mortality u(x) equal 1/(1 +.x)?? Certainly not. If it were 
true, we would have had 


sl) =p- | mt) =exp{-1} = 5. (1.1.13) 

In a more general setting, condition (1.1.12) may be non-necessary. For example, in the 
case of an insurance policy covering accidental death, the company is interested only in the 
random time X of death if it is a result of an accident. In this case, such an accident may not 
happen at all, and there is nothing unnatural in the assumption P(X < œ) < 1. (In previous 
chapters we called such a r.v. and its distribution defective or improper.) 


Next, we show that 


If a death or, in general, a failure may happen from several independent causes, then the 
hazard rate is the sum of the hazard rates corresponding to the separate causes. 


(1.1.14) 

To show what this means precisely, it suffices to consider the case of two causes. For 
example, we may distinguish the cases where death comes from natural reasons and where 
it is a result of an accident. 

Denote by X the lifetime under the assumption that the failure results only from the first 
cause. In the example above, it would mean that when considering X, we do not take into 
account the possibility of an accident. Denote by X2 the corresponding r.v. with regard to 
the second cause. Say, in the same example, X2 will be the moment of the accident, if any, 
with a lethal outcome. 

Then the actual lifetime X = min{X1,X2}. Assume X,X2 to be independent and denote 
by u;(x) the hazard rate of X;. Then, by (1.1.10) and by virtue of independence, 


s(x) = P(X > x) = P(min{X,,X2} > x) = P(X, > x, X% > x) = P(X > x)P(X2 > x) 


= exp f- ['m(eaz} exp {- [mteac} =exp 5i (ne)44a(2) dc} . (1.1.15) 


Comparing this with (1.1.10), we see that 


a(z) = u (z) +42(Z). (1.1.16) 


396 7. SURVIVAL DISTRIBUTIONS 


Clearly, this may be generalized to the case of three and more causes. We consider this 
instance in greater detail in Section 2. 

Now, let us focus our attention in the rightmost member of (1.1.15). We see that for 
(co) =0, it suffices that (1.1.12) holds for only one of the hazard rates, u(t) or u2(t). It 
is natural since now we can suppose that with positive probability one cause may not act 
(happen) at all. 

EXAMPLE 8. Let us come back to the above example with two death causes. Let 
u (z) = 1; that is, X; is standard exponential, and u2(z) = 1/(1 +z)?. Then, as was shown 
in Example 7, lim,_,..P(X2 > x) = e7!. This may be interpreted as if with probability 1/e 
there will be no accident. In accordance with (1.1.15), the total survival function 


s(x) apf- [ (1+ (1-42)) ae} =exp{-x- a}. and s(%) =0. 


1.2 The time-until-death for a person of a given age 


Consider a person age x. It is customary to use, for brevity, the term “Jife-age-x” or the 
symbol (x). The future (remaining) lifetime, or the time-until-death after x is denoted by 
T(x). In other words, T(x) = X —x given X > x. Hence, the distribution of T(x) is the 
conditional distribution of X — x given X > x. In particular, the survival probability 


P(T (x) >t) =P(X >x4+t|X > x). 


This is the probability that the person of age x will live at least t years more. The traditional 
notation for this conditional survival function is ;px. 
The corresponding d.f. 


P(T (x) <t) =P(X <x+t|X >x) =1—P(X >x+t|X >x) =1-ypx. 
In keeping with tradition, we denote P(T (x) < t) by rq. So, 
19x = 1— +Px- 


Note again that ,q, is the d.f. of T(x). 
Given a survival function s(x), 
P(X >x+t) s(x+t) 


iPx = P(X >x+t|X >x) = PRA = ar? (1.2.1) 


provided s(x) £0. 
Clearly, for a new-born (x = 0) 


Po = s(t). 
From (1.2.1) and (1.1.10) it follows that 
= s(x +t) a exp {— fj’ u(z)dz} ey {- x+t ) 
P A epi uda} P a u(z)dz ¢ . (12.2) 


1. The Probability Distribution of Lifetime 397 


EXAMPLE 1. What will happen if the force of mortality is doubled? From (1.2.2) it 
follows that if the new force of mortality, say, u* (x) = 2u(x), then the new probability 
1P* = (+x). A traditional example here concerns non-smokers and smokers. Assume that 
in a given country, the force of mortality for non-smokers is half that of smokers for all 
x’s. Assume that for 20-year-old non-smokers the probability of attaining age 70, that is, 
50P20 = 0.95. Then for smokers it is (0.95)? = 0.9025, so the difference is not dramatic. 
However, if for a 65-year non smoker the probability to live at least 15 years more, i.e., 
15Po5 = 0.4, then for smokers, this probability will be much less: 0.47 = 0.16. 


The following two facts follow practically immediately from (1.2.1)-(1.2.2). Let ux(t) be 
the hazard rate for the r.v. T(x), and let f7(,)(t) be the density of T(x). As above, let u(x) 
be the hazard rate for X. Then 


u(t) =u(x+t), (1.2.3) 


and 
fræ (t) = M(t) Px = u(x + t) iPx (1.2.4) 
Indeed, since ;q, is the d.f. of T (x), the density 


E EE AE ia E T = ade, as) 
T(x) one ot tPx) = ap tee Re exp if U(Z)aZ ?, Ze 


where in the last step we differentiated (1.2.2) with respect to t. Since the exponent above 
equals ;p,, we have frox) (t) = u(x +1) +px. On the other hand, by the definition of a hazard 
rate, 

frat) frw) _ uxt) px 
> t) tPx tPx 


= u(x +t), 
which implies (1.2.3). 


Other frequently used probability characteristics are defined and denoted as follows: 


q4x=14x=P(T(x) < 1), 
Px= 1Px = P(T (x) > 1), the probability that a life-age-x will live at least one year more; 
tu4x=P(t < T(x) <t+u)=P(x+t <X <x+t+u|X >x). 


the probability that a life-age-x will die within one year; 


Following traditional notation, we omit the 1 in ;1qx writing ;/qx. 
Clearly, 


t\udx = P(T (x) < t+u) —P(T(x) < t) = ttudx — tx = tPx — t+uPx, (1.2.6) 
since ¡Py = 1 — tqx- 
EXAMPLE 2. (a) Prove that for any t > 1, 
tPx = Px’ t-1Px+1- (1.2.7) 


From a heuristic point of view, this is almost obvious. To attain age x+ t, the person of age 
x should survive the first year (the probability of this is px) and after that, being x + 1 years 


398 7. SURVIVAL DISTRIBUTIONS 
old, the person should live at least t — 1 years. The formal proof is as follows: 


P(X >x+t) 

P(X >x) 

P(X >x+1) P(X >x+t) 

= . = P(X > 11X >x)P(X > t|X > 1 
Pia. PCa re E E 

=P(X >x+1|X >x)P(X >x4+14t-1|X >x4+1) = py: pipes. 


tPx=P(X >x+t|X >x) = 


It does not make sense, however, to carry out such formal calculations each time. In many 
problems below, it is sufficient to reason as we did in the beginning of this example. 
(b) In Exercise 13, in a similar way, we prove that for integer t = 1,2,..., 


tPx = Px* Px+1' +++ ' Px+t-1 (1.2.8) 


(where px = 1px). 
(c) Prove that 


tludx = tPx* udx+t- (1.2.9) 


We could apply, for example, (1.2.6) and (1.2.1) but a more illustrative approach is to use 
the same logic as above. For the event {tf < T(x) < t +u} to occur, the person of age x 
should first attain the age x +f (the probability of this is ;p,) and after that, being x +t 
years old, the person will die within u years. The probability of the latter event is „qx+r, 
and the two probabilities mentioned should be multiplied. A formal proof runs similar to 
what we did above and we skip it. 


EXAMPLE 3. Let us return to the situation of Example 1.1-1. 
s(80) _ (0.2)? 


s(20) (0.8)? _— 


(a) Find 6op20. By (1.2.1), 60p20 = s(80)/s(20). For a = 2, we have 
s(80) v0.2 


0.0625, while for a = 1/2 the fraction = —— = 0.5. In the latter case, on the 
i 520) v0.8 
average, half of the population of 20-year-old people will attain the age of 80, while in the 


former case the corresponding share is less than 7%. 


. s(80) s(90 
(b) Find 60/10920 for & = 2. By (1.2.6) 60\10920 = 60P20 — 70P20 = te 


(0.2)? (0.1? 3 s(20) s(20) 
(0.8)2 (0.8)2 64 


EXAMPLE 4. Find 5p3ọ if the force of mortality u(x) = 1/70 for all x’s. We saw that a 
constant force of mortality corresponded to the exponential distribution with the parameter 
equal to the (single) value of u(x). Thus, X is exponential with parameter a = 1/70. In 
view of the lack-of-memory property, ;px = P(X >x+t|X >x)=P(X >t)=s(t)=e™. 
Thus, 5730 = exp{—7 -5} + 0.931. 

To what extent is the exponential distribution model realistic? Certainly, we cannot as- 
sume that the lack of memory property is true for the total lifetime: how long a person will 
live does depend on her/his age. However, for a young person (say, age 30 as above), the 
probability that she/he will not die within a fixed and relatively short period of time (as 5 
years), the assumption of the constancy of the mortality rate is not artificial since the causes 


1. The Probability Distribution of Lifetime 399 


of death in this case are weakly related to age. The reader may look at the graph of a real 
force of mortality in Section 1.5.1. We continue this discussion in Exercise 21. 

EXAMPLE 5 ([158, N1]?). For an individual who is currently age 25, u(x) = qrez for 
0 <x < 110. Calculate the expected number of years lived between ages 30 and 70 for that 
individual. 

We proceed from the following two simple facts we formally prove in Exercises 7a-7b. 
First, the above type of mortality force corresponds to a uniform distribution; in our case, 
on [0,110]. Secondly, for a uniform X, the remaining life time T(x) is also a uniform r.v.; 
in our case, T = T (25) is uniform on [0,85]. The expected number of years mentioned is 
the r.v. S = S(T) equal 0 if T <5; equal to T — 5 if 5 < T < 45; and equal to 40 if T > 45. 


Hence, 
ZORTE 5) : dt +40 A298 24 
~ ds 85 res 


Next, we consider the mean future lifetime E {7 (x)}. It is called a complete-expectation- 


of-life and in actuarial calculations is denoted by ey. 
Since ;qx, as a function of t, is the d.f. of T(x), formally we can write 


o= E{T(x)} = [ idas 


where d denotes differentiating with respect to t. One can calculate the last integral directly 
by using the formulas qx = 1 — ¿px and (1.2.2). However, it is more convenient to use 
formula (0.2.2.2) and write 


E{T(x)} = f P(T(x) > t)dt = f pri. 
0 0 
From this and (1.2.1), using the variable change y = x +t, we get 


1 
s(x 


E(T(s)}= [ ea = 5 ff setae = [say (1.2.10) 


s(x) 


EXAMPLE 6. Find èso if s(x) = (1 — x/100)? for x < 100. By (1.2.10), E{T (50)} = 
50) J5o5(»)dy. Since s(y) = 0 for y > 100 (see Example 1.1-1), we have 


ETSO) = h, Ody = ae fy C -3/100 


The change of variable u = 1 — y/100 will lead to E {T (50)} = 50/3. So, in the case under 
consideration, 50-year-old people live on the average only 164 years more. (See comments 
in Example 1.1-1.) 


EXAMPLE 7. Find ey in the situation of Example 4. As was shown, we have the ex- 
ponential distribution with parameter 1/70. Hence, E{X} = 70. By virtue of the lack-of- 


memory property, E{T (x)} = E{X} for any x. Hence, e,= 70. Certainly, now the example 


Reprinted with permission of the Casualty Actuarial Society. 


400 7. SURVIVAL DISTRIBUTIONS 


is artificial even for small x, since we are computing the whole life duration; see also Exer- 
cise 21. 


To compute Var{T (x)}, we first compute E{T*(x)}. Again, one can do it directly, but it 
is more convenient to use (0.2.2.2) in the following way: 


E{T*(x)}= [ Pe) )>t)dt = [ Pre) ) > vt)dt = f spar (1.2.11) 


With the change of variable u = yt, we get 


E{T?(x)} = f 2uapsdu= ggh Etudu (1.2.12) 


For the variance, we write Var{T (x)} = E{T?(x)}—(E{T(x)})?. In Exercise 29, we 
consider a particular example. 


1.3 Curtate-future-lifetime 


Often people count only the number of complete years survived, that is, the integer part 
of T (x). This characteristic is called a curtate-future-life-time and is denoted by K(x). By 
virtue of (1.2.9), 


P(K(x) =k) = P(k < T(x) <k+ 1) = kPx* 19xt+k = kPx* Ux+ks (1.3.1) 


(we omit the prefix 1 in 194,44). 
The mean E{K(x)} is denoted by e,. By (1.3.1), 


ey = E{K(x) -X kP(K(x) =k) = } krPxqx+k- 
k=0 


Sometimes it is more convenient to use formula (0.2.2.3), which gives 


oo 


E{K(x)} = LeKi =} P(T(x)>k+1). 


k=0 
Since T (x) is a continuous r.v., P(T (x) =k+1) =0, and hence P(T (x) >k+1) =P(T (x) > 
k+1) = g41px. Eventually, by the variable change n = k +1, 


oo 


E{K(x)} = 2 k+1Px = Ł nPx- (1.3.2) 


n=l 


EXAMPLE 1. Let X be exponential with E{X} = r. We know that such an example is 
artificial, and we are only considering it for illustrative purposes. As shown in Example 
1.2-6, by the memoryless property, e= E{T(x)} = E{X} =r. By the same property, 
nPx = P(T (x) >n) = P(X > n) = e™/". In view of (1.3.2), 

oo 1/r 


ex = L nPx = 3 = 2 = Ie (1.3.3) 


n=1 
as a geometric series. Thus, the rounding procedure changes the expectation but not much. 
For example, if r = 70, then e= 70, while (1.3.3) gives e, ~ 69.501. 


1. The Probability Distribution of Lifetime 401 


1.4 Survivorship groups 


Consider now a group of lo newborns. In practice, demographers usually set lo = 100,000. 
Assume all lifetimes to be mutually independent with the same survival function s(x). Let 
L(x) be the number of survivors to age x, and let ly = E{ L(x)}. 

We can view the survival to age x of a particular newborn as a success which happens 
with probability s(x), and £(x) as the number of successes in lọ independent trials. Hence, 
L(x) has the binomial distribution with parameters s(x) and lọ. Then 


L,=tlos(x), (1.4.1) 
Var{ £(x)} =los(x)(1—s(x)), 
_p_{\« Z lo—k 
P(L(x) =k) = (?)s (x)(1 —s(x))?. 
From (1.4.1) we have 
s(x) =1x/lo, (1.4.2) 


which provides a way of estimating the survival function. For instance, if in a particular 
homogeneous group of 100,000 newborns, 96,381 persons survived 30 years, we may 
suppose that s(30) ~ 0.96. We omit here more precise statistical calculations such as those 
of confidence intervals, etc. 


Now, denote by , D(x) the number of deaths that occurred in the time interval (x,x +n], 
and set „dy =E{,D(x)}. The probability that a particular person will die within the interval 
mentioned is P(x < X < x+n) =s(x) —s(x+n). So, similar to the argument above, , D(x) 
has the binomial distribution with parameters s(x) — s(x +n) and Jo. In particular, in view 
of (1.4.1), 

ndx = Io|s(x) —s(x+n)| = lx — lx+n- 


Set dy = ,d, and observe that due to (1.4.2), the force of mortality 


where l is the derivative of ly in x. For a small interval [x,x +ô], the quantity (/,,3 —1,)/6 
may be considered an estimate of l. If in a particular situation, we view 6 = 1 as small, the 


increment /,,; — ly = —d, may be viewed as an estimate of l. Thus, we have arrived at the 
estimate r ta 
u(x) x = = be hl (1.4.3) 
lx ly 


We see that such an estimate coincides with qx. 

For instance, if as in the example above, from 100,000 newborns, 96,381 survived 30 
years, and if 120 individuals died between 30 and 31 years, then we may write that 

120 
xœ —— x 0.0012. 
HO) © 5638] 

In Section 1.5.2, we consider the estimation of u(x) and interpolation of u(x) within integer 
years in greater detail. 


402 7. SURVIVAL DISTRIBUTIONS 


Next, note that from (1.4.2) and (1.2.1) it follows that 
nPx = eee se (1.4.4) 


Assume, for instance, that in the situation of the example above, in a group of 100,000 
newborns, 95301 survived 35 years. Then the probability that a 30-year-old person will 
live at least 5 years more, that is, 5p30, may be estimated as (95301 /96381) ~ 0.988. 


1.5 Life tables and interpolation 
1.5.1 Life tables 


Below we consider “Life table for the total population: United States, 2002”.° It repre- 
sents estimates of survival probabilities and other characteristics, based on the data on the 
entire population in the years around 2002. Traditionally, these estimates are given in terms 
of characteristics l, and dy, but it does not mean that just some particular 100,000 newborns 
were observed until the last died. 

The table is reproduced from the “National Vital Statistics Reports” [3] of the National 
Center for Health Statistics; see also references there. One may find many interesting facts 
in such reports. 

The definitions of the positions in the table are also cited from [3]. The values in the table 
concern only entire years. In the next section, we consider some possible interpolations for 
fractional years. 

Column I shows the age interval between the two ages indicated. 

Column 2 shows the probability (more precisely an estimate of this probability) of dying 
within the interval indicated in Column 1. The figures in this column are a result of the 
analysis of some data and form the basis of the life table. All subsequent columns are 
derived from them. 

Column 3 contains the numbers l, of persons from the original synthetic (virtual, imag- 
ined) cohort of 100,000 live births, who survive to the beginning of each interval. The 
calculation of l, is based on (1.4.2) and (1.2.8). Having g,’s, we know pyx = 1 — qx and we 
can compute ly = Ippo-...- Py_1. For example, l2 = 10°(1 — 0.006971)(1 — 0.000472) = 
99256.02903 ~ 99256. As a matter of fact, we compute an estimate of the expected value 
of the number of survivors using only its integer part by tradition. 

Column 4 shows the number d, of persons dying in each successive age interval out 
of the original 100,000 births. Formally, we compute the corresponding expected value 
(more precisely, its estimate). Clearly, dy = ly — 1,41, although the reader may notice that, 
for example, in the rows 6-7 and 7-8 we see that 99, 163 — 99,148 = 15 while in the next 
column this difference is estimated as 14. This is a result of round-off of the original 
products of probabilities. 

We just quite briefly touch on the characteristics in Columns 5-6. 


3There are fresher tables but they do not differ much from this table used also in the first edition. For the 


convenience of people who used in their study the first edition, we keep this table here too.) 


1. The Probability Distribution of Lifetime 403 


TABLE 1. Life table for the total population: United States, 2002. The table is reproduced 
from “National Vital Statistics Reports” [3] of the National Center for Health Statistics. 


4 5 8 
Probability Number Person- 
of dying dying years lived | Total number Force of 
between Number between between of person- Expectation Mortality 
surviving to} agesx ages x years lived of life (exponential 
to x+1 above age x at agex interpolation) 
T, u(x) 
0-1 0.006971 100 000 7 725 787 0.006995 
1-2 0.000472 99 303 7 626 399 0.000472 
2-3 0.000324 99 256 7 527 119 0.000324 
3—4 0.000239 99 224 7 427 879 0.000239 
4-5 0.000203 99 200 7 328 667 0.000203 
5-6 0.000176 99 180 7 229 477 0.000176 
6-7 0.000144 99 163 7 130 306 0.000144 
7-8 0.000142 99 148 7 031 151 0.000142 
8-9 0.000152 99 134 6 932 009 0.000152 
9-10 | 0.000145 99 119 6 832 883 0.000145 
10-11 | 0.000151 99 105 6 733 771 0.000151 
11-12 | 0.000153 99 090 6 634 674 0.000153 
12-13 | 0.000186 99 075 6 535 592 0.000186 
13-14 | 0.000225 99 056 6 436 526 0.000225 
14-15 | 0.000266 99 034 6 337 481 0.000266 
15-16 | 0.000346 99 008 6 238 460 0.000346 
16-17 | 0.000573 98 973 6 139 470 0.000573 
17-18 | 0.000680 98 917 6 040 525 0.000680 
18-19 | 0.000849 98 849 5 941 642 0.000849 
19-20 | 0.000942 98 765 5 842 835 0.000942 
20-21 | 0.000934 98 672 5 744 116 0.000934 
21-22 | 0.000985 98 580 5 645 490 0.000985 
22-23 | 0.000939 98 483 5 546 958 0.000939 
23-24 | 0.000949 98 391 5 448 521 0.000949 
24-25 | 0.000948 98 297 5 350 177 0.000948 
25-26 | 0.000930 98 204 5 251 927 0.000930 
26-27 | 0.000953 98 113 5 153 768 0.000953 
27-28 | 0.000913 98 019 5 055 703 0.000913 
28-29 | 0.000940 97 930 4957 728 0.000940 
29-30 | 0.000994 97 838 4 859 845 0.000994 
30-31 | 0.001024 97 740 4 762 056 0.001025 
31-32 | 0.001063 97 640 4 664 365 0.001064 
32-33 | 0.001061 97 536 4 566 777 0.001062 
33-34 | 0.001185 97 433 4 469 293 0.001186 
34-35 | 0.001251 97 317 4371 917 0.001252 
35-36 | 0.001369 97 196 4274 661 0.001370 
36-37 | 0.001454 97 063 4177 532 0.001455 
37-38 | 0.001568 96 922 4 080 540 0.001569 
38-39 | 0.001718 96 770 3 983 694 0.001719 
39-40 | 0.001913 96 603 3 887 008 0.001915 
40-41 | 0.002072 96 419 3 790 497 0.002074 
41-42 | 0.002236 96 219 3 694 178 0.002239 
42-43 | 0.002357 96 004 3 598 067 0.002360 
43-44 | 0.002634 95 777 3 502 177 0.002637 
44-45 | 0.002826 95 525 3 406 525 0.002830 
45-46 | 0.003061 95 255 3311 135 0.003066 
46-47 | 0.003301 94 964 3 216 026 0.003306 
47-48 | 0.003509 94 650 3 121 219 ; 0.003515 
48—49 | 0.003888 94 318 3 026 735 0.003896 


49-50 | 0.004134 93 951 2 932 600 0.004143 


404 7. SURVIVAL DISTRIBUTIONS 


TABLE 1 (continued). 


Probability Number 
of dying dying Force of 
between Number between between of person- | Expectation Mortality 
ages x surviving to | ages x ages x years lived of life (exponential 
to x+1 age x to x+1 to x+/ above age x at age x interpolation) 


0.004422 2 838 843 ; 0.004432 
0.004822 2 745 487 ; 0.004834 
0.005003 2 652 563 : 0.005016 
0.005549 2 560 094 ; 0.005564 
0.005845 2 468 114 : 0.005862 
0.006719 2 376 658 i 0.006742 
0.006616 2 285 776 ; 0.006638 
0.007621 2 195 500 ‘ 0.007650 
0.008344 2 105 866 : 0.008379 
0.009429 2 016 948 F 0.009474 
0.009747 1 928 820 : 0.009795 
0.010877 1 841 536 ; 0.010937 
0.011905 1755 153 : 0.011976 
0.012956 1 669 753 : 0.013041 
0.014099 1 585 414 ; 0.014199 
0.015308 1 502 217 i 0.015426 
0.016474 1 420 242 : 0.016611 
0.018214 1 339 569 : 0.018382 
0.019623 1 260 295 : 0.019818 
0.021672 1 182 520 í 0.021910 
0.023635 1 106 350 A 0.023919 
0.025641 1 031 905 : 0.025975 
0.027663 959 294 ; 0.028053 
0.030539 888 616 ; 0.031015 
0.033276 819 994 : 0.033842 
0.036582 753 560 ; 0.037268 
0.039775 689 444 ; 0.040588 
0.043338 627 775 : 0.044305 
0.047219 568 666 : 0.048370 
0.052518 512 230 . 0.053947 
0.057603 458 606 : 0.059329 
0.062260 407 930 ; 0.064283 
0.071461 360 288 : 0.074143 
0.073437 315 825 ; 0.076273 
0.084888 274 581 A 0.088709 
0.093123 236 593 : 0.097748 
0.101914 201 979 : 0.107489 
0.111270 170 733 ; 0.117962 
0.121196 142 810 : 0.129193 
0.131694 118 125 ; 0.141211 
0.142761 96 552 i 0.154039 
0.154390 77 931 ; 0.167697 
0.166569 62 069 ; 0.182204 
0.179282 48 744 f 0.197576 
0.192507 37 716 ; 0.213821 
0.206215 28 730 : 0.230943 
0.220375 21 530 : 0.248942 
0.234947 15 859 : 0.267810 
0.249887 11 474 ; 0.287531 
0.265146 8 148 ; 0.308083 
1.00000 5 675 , 1.000000 


1. The Probability Distribution of Lifetime 405 


Column 5 shows the number of person-years lived by the synthetic cohort within the 
corresponding interval. More precisely, 


1 1 1 
Li = v f tdF(x+t) they = f td(—Ly++4) + L414 = f lydt. 
0 0 0 


(The integration above is over ¢ with the last step consisting in integration by parts.) The 
figures given in Column 5 are a result of approximate computations of the integral based 
on values of ly at integer points. 


Column 6 shows the number of person-years that would be lived after the beginning of 
the corresponding age interval. More precisely, 


r=h f dF(x+t) = f ta(n) = f Lardt. 
0 0 0 


(Again, integration is over ¢, and the last step consists in integrating by parts.) The figures 
given are a result of an approximation for the above integral. 


Column 7 shows the life expectation at the beginning of the corresponding interval. The 
figures are based on (1.3.2). 


Column & has been added and is not contained in the official table from [3]. We saw in 
Section 1.4 that one of the possible estimates for u(x) was given by (1.4.3) and coincided 
with the estimate for g, in Column 2. It corresponds to the so called linear interpolation 
which we discuss in the next Section 1.5.2. 


For comparison, we give in Column 8 an alternative estimate given by the formula 
ln(lx/lx+1). Such an estimate corresponds to the so called exponential interpolation which 
we also discuss in detail in Section 1.5.2. Since Column 3 contains rounded numbers, when 
calculating figures in Column 8, we proceeded directly from the data in Column 2. More 
precisely, since 41 = lx: py = lx: (1 — qx), we can write that 


In(Iy/Le41) = -In (lest /l,) = —In(1 — qx). (1.5.1) 


This is the formula we use for figures from Column 8. See again Section 1.5.2 for detail. 
The reader can observe that for small and moderate ages, both estimates in Column 2 and 
in Column 8, are close, but for large ages they somewhat differ. See also Example 1.5.2-1. 


In Fig.2-4, we graph the empirical survival function, density, and force of mortality, 
respectively. These empirical functions are the estimates of the respective theoretical func- 
tions s(x), f(x), and u(x). The estimates are based on the data presented in Table 1. More 
precisely, in Fig.2 we graph /,.//p, and in Fig.3 we graph d,/lp. The latter empirical curve 
has been slightly smoothed in the region of ages 82-84. The graph in Fig.4 represents the 
figures from Column 8. All graphs look quite natural, the survival function is concave in 
practically the whole range of ages, and the force of mortality is low for moderate ages. All 
of this attests to good living conditions in this country at this time. 


406 7. SURVIVAL DISTRIBUTIONS 


1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 
x 


FIGURE 2. An estimate of the survival function s(x) based on Table 1. 


EXAMPLE 1. On the basis of Table 1, we estimate the values of the following charac- 
teristics. 


d 15 

(a) The density value f(11). A rough estimate is T = 0.00015. 
l 75335 

(b) The probability s(70). We may estimate it by T = 0000 0.75335. 


(c) The probability that a person of age 30 will live at least 50 years more. We have 
217 

s(80) = lgo al 52178 ~ 0.534, 

s(30) lso 97740 

(d) The probability that a person of age 20 will die between the ages 70 and 80. In this 


50P30 = 


fx) 


1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 
x 


FIGURE 3. Anestimate of the density f(x) based on Table 1. 


u(x) 


1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 
x 


FIGURE 4. An estimate of the force of mortality u(x) based on Table 1. 


1. The Probability Distribution of Lifetime 407 


170 — lg0 a 75335 — 52178 ~ 0.235. 

loo 98672 

(e) The mode of 7, that is, the “most probable” value, and local extrema of f(x). Ana- 
lyzing the table, we see that the maximum value of d, is attained at x = 85. 

There are also local maxima at x = 0,19,21,26. The fact that we observe a local max- 
imum at x = 0 is understandable since it concerns the mortality of newborns. As to the 
three other ages, we should remember that we are dealing with estimates, and we must not 
think that, for example, the age of 20 is less dangerous than that of 21. However, we may 
conclude that the region of 19 — 26 is characterized by having a local mortality maximum 
(see also Fig.3). The reader is invited to give her/his explanation of this fact. 

The minimum corresponds to the region of 6 to 10. 

(f) The force of mortality attains its minimum in the same region of 6 — 10 and is increas- 
ing except x = 0. In Exercise 34, the reader is encouraged to interpret all these results in 
more detail. 


case, we deal with 50\199¢20 = 


1.5.2 Interpolation for fractional ages 


Except the first year of life, life tables give usually the survival probabilities for integer 
years, that is, for the curtate lifetime K. To find the probabilities +q for all t’s and x’s, we 
should establish interpolation rules for the distribution of the lifetime between integers. We 
consider here the following two types of interpolation. 


Linear interpolation, or the uniform distribution over each year of age. Assume that 
for any interval with integer endpoints, if death occurs in this interval, all possible time 
moments are equally likely to be the time of death. This means that the density fx (x) 
is constant in any interval (k,k +1) where k is an integer. Since f(x) = —s‘(x), this 
implies that s’(x) is constant in each (k,k+ 1), and hence s(x) is linear in each (k,k+ 1). 
Consequently, taking an intermediate point k +t, where 0 <t < 1, we may write that 


s(k+t) = (1—t)s(k) +ts(k+1). (1.5.2) 


In case (1.5.2), we may readily write a formula for the force of mortality. By (1.1.7), 
u(k+t) = —s'(k+t)/s(k +t), and differentiating (1.5.2) in t, we have s’(k +t) = s(k + 
1) —s(k). This leads to 


s(k) —s(k+1) 


Ke s(k+t) 


for ¢ € [0,1]. (1.5.3) 


For t = 0, it coincides with qx. 
Exponential interpolation. Now let us assume that within any interval (k,k+ 1), the force 
of mortality is a constant ug. Then by (1.1.10), for t € (0,1), 


k+t 
ukdz=lns(k)—uzt. 


(1.5.4) 


msk+)=- f uod- f nod f ule)ae=ins(e)— f 


Setting t = 1, we have Ins(k+ 1) =Ins(k) — ug, which gives 


uk = Ins(k) —Ins(k+1) = In[s(k)/s(k+ 1)]. (1.5.5) 


408 7. SURVIVAL DISTRIBUTIONS 
We used this interpolation in Table 1.5.1-1. Substitution into (1.5.4) leads to 
Ins(k+t) = (1 —t)Ins(k) +f Ins(k+1). (1.5.6) 


Comparing it with (1.5.2), we see that, in this case, linear interpolation is being carried for 
In s(x) rather than for s(x). 

Some straightforward examples are considered in Exercise 37. In Exercise 38, we show 
that linear interpolation leads to precise formulas if X is uniformly distributed, and the same 
is true for exponential interpolation if X is an exponential r.v. 


Now, let us compare the two above estimates of u(k) for an integer k. As has been noted 
s(k) —s(k+1) = lk — Uy 
s(k) g lk 
The estimate based on (1.5.5) is equal to —In(1 — qx) as shown in (1.5.1). 
As we know, — In(1 — z) ~ z for small z, or more precisely, — In(1 — z) = z4 52 + o(z7); 
see (0.4.2.8). Thus, for small qg, both estimates should be close; for larger qg, they will 
slightly differ. The latter case corresponds to large ages. 


repeatedly, qk = 


, and the estimate based on (1.5.3) equals qx. 


EXAMPLE 1. Consider estimates of u(k) in Table 1.5.1-1. The estimates based on 
(1.5.3) coincide with qg, and Column 2 contains estimates based on linear interpolation. 
Column 8 corresponds to exponential interpolation. We see that for ages from one to rela- 
tively moderate numbers, the estimates either coincide up to the 6th digit or differ at most 
at the 5th digit. For larger ages, the difference is larger. For k equal to 70 for example, 
the two respective estimates are 0.023635 and 0.023919. However, the estimates for (90) 
are 0.142761 and 0.154039. For k = 99, the corresponding numbers are 0.265146 and 
0.308083. 

Note again that we consider such precise numbers merely for illustration. Of course, it 
does not mean that the real estimates are accurate up to the sixth digit. On the contrary, 
the accuracy of the original estimates of qy from which we proceed may be much less, and 
in this case, the difference between the two interpolations may not have essential signifi- 
cance. 


EXAMPLE 2 ([153, N31]*). You are given the following: (a) q70 = 0.04; (b) q71 = 0.05; 
(c) deaths are uniformly distributed within each year of age. Calculate the probability that 
(70) will die between ages 70.5 and 71.5. We may write 


P(0.5<T(70) < 1.5) = P(0.5 < T(70) < 1) + P(1 < T(70) < 1.5) 
=P(0.5 < T (70) < 1) +P(T(70) < 1.5|T(70) > 1)P(T(70) > 1) 
=P(0.5 <T(70) < 1) +P(T(71) < 0.5)P(T (70) > 1). 


“—~ 


1 
1 


Using the uniformity assumption, we have P(0.5<T(70)<1) = 5P(0<T(70)<1) = 5470. 
and similarly, P(T (71) < 0.5) = 4q71. Thus, 


1 1 
P(0.5 < T(70) < 1.5) = 59104 (1 qmo) 5411 = 0.044. 


Reprinted with permission of the Casualty Actuarial Society. 


1. The Probability Distribution of Lifetime 409 


Certainly, the assumptions above are merely used for approximation purposes. For ac- 
tual survival functions, neither the densities nor the forces of mortality are constant within 
integer years. However, such assumptions may be close to reality (see, for example, a re- 
mark on uniformity at the end of the next Section 1.6), and in any case, they lead, as a rule, 
to a good interpolation accuracy. (Not saying that, as we saw, for moderate ages different 
interpolations lead to close results.) 


1.6 Analytical laws of mortality 


Over many years, there has been a great deal of interest in finding analytical representa- 
tions for survival functions. All such attempts were based on the belief that the duration of 
human life is subject to relatively simple universal laws. These laws, if any, have not been 
found yet, and the formulas below should be viewed merely as possible approximations 
of “real” survival functions. Such approximations may be helpful in calculations and in a 
theoretical analysis. 

De Moivre in 1729 suggested the simplest approximation: the uniform distribution on an 
interval [0,œ]. In this case, for x € [0,0], 


s(x) =1-—, and u(x) = 1/(@—x), (1.6.1) 


which may be easily obtained using (1.1.7). In the actuarial literature, one may often see 
references to this case as to De Moivre’s law of mortality. 

Certainly, if we consider the future lifetime of a newborn, this is a very rough approxi- 
mation. However, as we will see below, for some particular age intervals, such an approxi- 
mation may turn out to be not too bad. 

Gompertz (1825) considered the exponential force of mortality u(x) = Be where B and 
Q are positive parameters. Substituting it into (1.1.10), after integration we get that 


s(x) = exp{—B(e™ — 1) /a}. 
Makeham (1860) suggested to model u(x) by 
u(x) = Be™ +A, (1.6.2) 


where an additional parameter A > —B. We may interpret (1.6.2) in the spirit of the scheme 
at the end of Section 1.1, where the duration of life is influenced by two independent factors, 
and the corresponding hazard rates are summed. In the case (1.6.2), u2(z) from (1.1.16) is 
the constant A, and hence the moment of the lethal accident, X2, is an exponential r.v. with 
parameter A. In our interpretation, the first term in (1.6.2) corresponds to death from other 
causes. 

The interpretation above is, however, rather speculative and should be considered op- 
tional. Moreover, the estimate of parameter A based on real data may occur to be negative. 
If it is larger than —B, the function u(x) based on such an estimate will be positive, and we 
can use it. So, at least in this instance, we should perceive the representation (1.6.2) as a 
whole, not interpreting its separate terms. 

Again applying (1.1.10), we obtain that for the case (1.6.2), 


s(x) = exp{—Ax— B(e™ —1)/a}. (1.6.3) 


410 7. SURVIVAL DISTRIBUTIONS 


s(x) 


Weibull's law 
0.8 je> 
LSN 
a 


The empirical curve 
[ J N 
Makeham's law i t, 


0.6 


0.4 


Hrer 
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 
x 


FIGURE 5. The empirical survival function based on the data from Table 1 
and the Makeham and Weibull approximations. 


In 1889, Makeham generalized his law using three terms and representing the force of 
mortality by 
u(x) =A+Cx+Be™, (1.6.4) 


where C is another parameter. See also Exercise 32. 
In 1939, Weibull suggested to approximate the force of mortality by the power function 
u(x) = Ax", where A,n are parameters. In this case, the integration in (1.1.10) leads to 


s(x) =exp{—Ax"*! /(n+1)}. 


It is interesting to check how these laws fit the data from Table 1. 
In Fig.5, the empirical survival function is graphed along with approximations concern- 
ing the Makeham and Weibull laws. The curves in Fig.5 correspond to the functions 


SMakeham(X) = exp{—0.001x — 0.0005 (e"°** — 1)}, 
Sweibul = exp{ —4(x/100)’}. 


We see that visually both curves fit reasonably well. The parameters chosen are not optimal 
and serve merely for illustrative purposes. 

(P75 One more interesting fact. In Fig.6, we graph 
ie the empirical survival function for T (75) in the age 
interval [75,95]; that is, the function /754;/l75 for 
08 0 <t < 20. The straight line in Fig.6 indicates the 
os linear trend. We know that the linear survival func- 
tion corresponds to the uniform distribution. So, 
we watch, actually, an amazing fact: the distri- 
bution of the remaining lifetime T (75) of people 


0 5 10 15 2 2 aged 75 is very close to the distribution uniform on 
Pena Sa ea [0,20]. For example, it is almost equally likely that 
FIGURE 6. a chosen at random 75-year-old person will die at 


age 76 or at age 94. 


2. A Multiple Decrement Model 411 


Route 1 = page 437 


2 A MULTIPLE DECREMENT MODEL 


First, note that the theory above may serve for modeling not only lifetimes of real indi- 
viduals but the durations of various insurance contracts which may terminate for reasons 
other than death of the client. For example, a medical insurance may terminate if the client 
ceases paying premiums, or moves to another area. So, in general, it makes sense to talk 
about the time-until-termination rather than the time-until-death. 

Whether we talk about a real lifetime or the duration of a contract in general, it is often 
important to distinguish the causes of termination. For example, the benefit payment in a 
life insurance contract may depend on whether death occurs because of natural causes or 
as a result of an accident. The same contract may include also a one-time benefit payment 
in the case of disability, after which the contract terminates. In the last case, we distinguish 
the following three instances: “natural” death, a lethal accident, and disability. 

We begin with a general framework. 


2.1 A single life 


Assume that there are m possible causes of the termination of a contract. The model 
below involves two r.v.’s: the time-until-termination T and the number J of the cause which 
will become the reason for termination. The r.v. T is a continuous positive r.v., and the r.v. 
J takes on values 1,...,m. 

In this context, termination itself is called a decrement, and the corresponding model— 
the multiple decrement model. 

For certainty, we will talk about the future lifetime of a person of age x (a life-age- 
x), assuming that T = T(x) and J = J(x). The symbols for all characteristics below will 
involve the corresponding index x. In situations that do not involve age, this index should 
be omitted. 

Let 

qr) = P(T (x) <t) = j), 


the probability that the decrement will occur before or at time t and it will happen due to 
cause j. 

As we know, the probability density of a one-dimensional r.v. is the derivative of the 
corresponding d.f. Then, with regard to the continuous component T of the r.vec. (T,J), it 
is natural to define the function 


~ ð G 
frit, j) = 514s. (2.1.1) 


We omitted the index x in the 1.-h.s. When it does not cause confusion, we will omit the 
index TJ also, writing simply f(t, j). 


412 7. SURVIVAL DISTRIBUTIONS 


The function f(t, j)may be considered the joint density of the vector (T,J), but we 
should keep in mind that since the r.v. J is discrete, while for a fixed j the function f(r, j) 
plays the role of a density, for a fixed ż this function is a mass function (that is, it deals with 
the probability that J will be equal to j). More precisely, for any set B in the real line, 


P(T €B,J=j)= f f@,j)dt, 
B 
while for any set B and any set K of integers, 


P(TEeB,JEK)=} frea (2.1.2) 


JEK p 


Let qx = P(T (x) < t), the marginal d.f. of the r.v. T (x). In actuarial notation, this charac- 
teristic is often denoted by ;q\” in order to emphasize that the function refers to all causes. 
We will omit the superscript (t). 

Since P(T (x) < t) = Yi P(T <t,J = j), 


TOIR (2.1.3) 
As usual, we set py?) =1- gq? tDx = 1— tqx 
Denote by fr(t) the marginal density of T. Similarly to (2.1.3), 


POHL ew): (2.1.4) 


One can derive (2.1.4) either proceeding from the usual logic of finding marginal distribu- 
tions (see Section 0.1.3.2), or differentiating both sides of (2.1.3). In the latter case, we 


recall that f(t, j) = aay ) by definition, and fr(t) = 2 tqx as is true for any density. 


Following the general definition (1.1.5) of hazard rate, we define the force of decrement 
(mortality if we talk about a lifetime) corresponding to T (x) as 


Frey) fro®) 


u(t) = TS ee (2.1.5) 


The force of decrement due to cause j is defined as 


dn py He as 


Note that, while the numerator involves the number of only one cause, the denominator in 


(2.1.6) is the same as in (2.1.5) and concerns all causes. The significance of ul ) (t) may be 
clarified as follows. For an infinitesimally small dt, 


ul) (t)dt = P(t < T(x) <t+dt, J =j|T(x) >t). (2.1.7) 


The r.-h.s. of (2.1.7) is the probability that a t-year-old person will die within the (infinites- 
imally small) interval [t,t + dr], and it will happen from cause j. 


2. A Multiple Decrement Model 413 


From (2.1.4)-(2.1.6) it immediately follows that 


we) =" (0) (2.1.8) 


Using the general formula (1.1.10) that connects the hazard rate and the tail of the distri- 


bution, we can write 
t 
tPx = exp i-f mls)ds} i (2.1.9) 


Two things are noteworthy. First, similar formulas for ; pi! ) may be not true since the 
denominator in (2.1.6) contains a probability concerning all causes. This is discussed in 
detail in Section 2.2. 


Secondly, as seen from (2.1.8)-(2.1.9), in order that ; py — 0 as t > ©, it suffices to require 

f ul ) (s)ds to diverge only for some j rather than for all j. This has a clear interpretation. 
0 

The contract will ever terminate iff at least one cause acts. We discussed it in detail at the 


end of Section 1.1. 
From (2.1.6) it follows that 


FEJ) = x O). (2.1.10) 
This allows us, in particular, to write a nice formula for P(J = j |T =t). Following the usual 


logic of calculating conditional probabilities (see, e.g., Section 0.7.1 and, in particular, 
(0.7.1.5)), we write 


P= jt =1) = Se. 
In view of (2.1.10), l 5 
PU = jT == O (2.1.11) 


It is crucial that the second factor does not depend on j, while the total sum of these proba- 
bilities, };-; P(J = j|T =t), naturally equals one. Thus, since P(J = j|T = t) is propor- 


tional to ul! ) (t), 


PO PO 


Pa) el) 


P(J=j|T=t)= (2.1.12) 


(Summing up both sides of (2.1.11), we have Ea » ul ) (t). The last equality in 
(2.1.12) follows from (2.1.8).) 
The marginal probability is given by 


PU = j) = f" f j)ar. (2.1.13) 


EXAMPLE 1°. Let m =2, u” (t) = 2t, and u® (t) = 3t? for a fixed x. The constants 2 
and 3 are chosen to make calculations below simpler. Actually, the choice of constants is 


5The idea of this example is the same as in examples from Section 10.2 of [19]. 


414 7. SURVIVAL DISTRIBUTIONS 


connected with the choice of a unit of time, so for a certain choice of units, the constants 
above may be realistic. We discuss this at the end of the example. 

Our goal is to demonstrate how we can find the joint, marginal, and conditional distribu- 
tions for the random vector (T (x), J(x)), and also E{T (x)}. 

By (2.1.8), u(t) = 2t + 317. Since py, u, and uP do not depend on x, we omit the 
index x in calculations. By (2.1.9), 


tP = P(T (x) >t) =exp{ f @s+3)as} =exp{-1°-1°} fort >0. 
0 
By (2.1.10), 
f(t,1) =2texp{—1? -t },, f(t,2) = 317 exp{-1°-r}. 
By (2.1.4), 
fr(t) = (2t +317) exp {t —1} ; 


In view of (2.1.13), PJ = 1) = Jo f(t, 1)dt = fọ 2texp {—1? — 13} dt. The last integral 
is not analytically computable, but using appropriate software, it is easy to obtain that 
P(J = 1) ~ 0.527, and accordingly, P(J = 2) = 1 — P(J = 1) ~ 0.483. 

By (2.1.12), 

2t 3t? 


P(J =1|T =t) = ——., P(J=2|T =t)= x 
( | ) mae TS | ) 2t + 312 


Now, 
E{T}= f ifr(dt = | t(2t +317) exp {-1? —1°} dt ~ 0.669 
0 0 


(to obtain this number we again use software). 
The conditional density of T given J = j is f(t| j) = f(t, j)/P(J = j). Hence, 


0.315 


PNG) l [ 2 ae 
E{T|J=1}= zs 2 2 E E A 
(Tipe Lor pgh 2exe{-? hd ~ Faas 0.598 


We may find E{T | J = 2} in the same way, but it is easier to write that 
E{T}=E{T |J =1}PV =1)+ E{T |J =2}PVU =2). 


The only unknown here is E{T |J = 2}. Calculations lead to E{T | J = 2} ~ 0.733. 
If the unit of time is one year, the answers do not appear realistic, but if we assume that 
the unit is 100 years, and, for example, x = 10 years, then E{T} = 66.9 years, E{T |J = 
1} = 59.8 years, and E{T |J =2} = 73.2 years, which does not look improbable. 


EXAMPLE 2. Let q$} = 0.05, qi} = 0.06, 4 = 0.01, qẸ) = 0.02, and 4) = 0.03. 
Find aag, i.e., the probability that a life (50) will terminate between ages 52 and 53, and 
it will happen from cause 2. (We omit 1 in 41qx-) 


2. A Multiple Decrement Model 415 


We have 1950 = 2Ps0°- q8) (the probability that the person will survive age 52, and 
subsequently, being 52 years old, will die within a year from cause 2). Next, 2 p50 = p50p51- 


By (2.1.3), q50 = gi + PA = 0.06. Similarly, g5; = 0.08, and consequently, pso = 1 — 


9450 = 0.94, and p51 = 0.92. Then 2P50 = 0.94 -0.92 = 0.8648, and T = 0.8648 - 0.03 = 
0.0259. 


2.2 Another view: net probabilities of decrement 


For the sake of simplicity, let us consider only two causes. Suppose that we can represent 
the termination time T (x) by the relation 


T(x) = min{T (x), T (x)}, (2.2.1) 


where the r.v. T; = Tı (x) may be interpreted as the time of termination due to the first cause 
if the second cause does not occur (or, more precisely, does not act), and T) = Tz (x) is the 
corresponding characteristic for the second cause. Since termination may occur from either 
cause, whichever occurs first, we have written above min{Tj (x), 72(x)}. 

We do not claim here that r.v.’s Tı (x) and 7>(x) with such properties can always be deter- 
mined. We simply assume for a while that it is possible to do. 

Below, we omit the argument x. 

For us, it is important to keep in mind that the r.v.’s Ti, T2 may be dependent. For exam- 
ple, healthier people—at least for some groups—are less prone to accidents. 

In the above model, the event {J = 1} (termination comes from the first cause) coincides 
with the event {7; < T2}, and the event {J = 2} with {7; > T2}. (We suppose Ti, T) are 
continuous r.v.’s, so where we write the non-strict inequality > is not important.) 

In such a setup, 


iq) =P(T <t,J =1) =P(T <t,T, < D) = P(T, <t,T) < T), (2.2.2) 


iq? =P(T <t,J =2) =P(T <t,T; > Ih) =P(h <t,T, > 1). (2.2.3) 


If we know the joint distribution of (7,7), we know the functions gq) and can define 
and calculate the density f(t, j) and the rates u(t), as we did it in (2.1.1) and (2.1.6). (We 
again omit x.) 

Denote by ire (t) the marginal hazard rate for Tj. The tilde is used to distinguish this 
characteristic from wl) (t), the force of decrement due cause j. 


The question is whether ji'/) (t) coincides with p(t). 


In general, it certainly does not since, for instance, p(t) is a characteristic of the 
marginal distribution of T, while u (t) is specified by the joint distribution of (T4, T>). 
(As has been previously noted, the denominator in the definition of pl!) (t) in (2.1.6) or the 
conditional probability in (2.1.7) involve the r.v. T = min{T;, T2 }.) 

We illustrate this below by an example, but it is worthwhile to emphasize that it is not 
necessary: for the reason mentioned above, we should not expect that fi) (t) = u(t). 


416 7. SURVIVAL DISTRIBUTIONS 


EXAMPLE 1. Let 7,,7> be uniformly distributed on the triangular { (t1,t2) : ti > 0,t > 
0, t) +t < 1}; see Fig.7. 

It is easy to see from this figure that P(T; >t) = (1 — 
t)*. (This probability equals the area of the middle triangle 
divided by the total area with the latter being equal to 1/2.) 


Hence, by (1.1.7), the marginal hazard rate f(t) = 
we = — On the other hand, P(min{7,,7} >t) = 
(1 — 2r)*. (We should calculate the area of the smallest des- 
ignated triangle and divide it by the total area.) 

The hazard rate corresponding to the last tail probability 
TE 41-2) _ 4 FIGURE 7. 


(1—24)? 1 = 29° 
We know that u(t) = u(t) + u® (t). On the other hand, by the symmetry of the joint 


2 2 
a Æ r Certainly, the point 
of this example is that T; , T2 are dependent. (See, for instance, Example 0.1.3.2-2.) 


distribution, u” (t) =u) (t). Hence, u® (t) = Tuo) 


However, if the causes are independent, the situation is simpler. 


Proposition 1 Jf T; and T are independent, then 
P E (2.2.4) 


A short proof will be given in Section 2.4. In Exercise 42, we discuss the assertion of 
this proposition from a heuristic point of view. 

Thus, if the causes act independently, the model is transparent, and the marginal hazard 
rates coincide with the forces of decrement. 

Before considering an example, note again that if T; , T are independent, then 


P(min{T|,I)} >t)=P( >t, h >t)=P(M>t)P(h >t). (2.2.5) 


EXAMPLE 2. Let 7; be the future lifetime of (50) in the case when we do not take into 
account the possibility of accidents, and let T) be the time of a lethal accident, if any. So, 
the real lifetime T = min{T,,7}. Assume that T, and T are independent, 7; is uniformly 
distributed on [0,50] (which corresponds to De Moivre’s law with œ = 100), and 7) is 
exponential with a hazard rate of u = 0.01. 

Thus, the marginal force of mortality for 7; is ji") (f) = 1/(50 — t); see also Exercise 7a. 
Clearly, fu?) (t) = u. 

By Proposition 1, u$) (t) = 1/(50—1), w(t) = u, and usolt) = wht (t) +u (t) = u+ 
1/(50 — t). The last relation also follows directly from (1.1.16). 

To compute ;p = P(T >t), we can apply (2.1.9), but it is much easier to make use of 
(2.2.5) which implies that 


2. A Multiple Decrement Model 417 


P(T >t) = P(T, > HPT > t) = (1 —t/50)e™ = (1 — t /50)e 0". 


For example, 20p50 = Be 0.0120 zx 0.491. 


Let us return to the general case. Assume that the density f(t, j), and hence the forces of 
decrement u) (t), are given. If we even can determine T}, T> as above (which is not always 
true), these r.v.’s may turn out to be dependent. The question is whether it is possible to 
construct a model with some independent r.v.’s T/, T;, distinct from T),7> and with the 
following properties. 

(i) The marginal hazard rates of T/, T} coincide with the respective forces of decrement. 

(ii) The r.v. T’ = min{Tj, T; } has the same distribution as the original time-until-termina- 
tionT. 

In this case, to find the distribution of T, we could consider the model with Ti ; Ty , which 
is simpler. 

The positive answer to the above question follows almost immediately from Proposition 
1. Let ;p’) be the tail probability function (or the survival function) corresponding to 
ul) (t). (We again omit the index x even if formally it should be involved.) More precisely, 
let 


5 t : 
p'® = exp icf ul) (as) (2.2.6) 


[compare with (2.1.9)]. Set 
iq’) as ip), 


The function ;q' O) is called a net probability of decrement, or an independent rate of decre- 
ment, or an absolute rate of decrement. 

Consider independent r.v.’s T/, T} having the marginal d.f.’s ;q/“!), ,q'), respectively. 
More precisely, P(T; < t) = :q/ O) and, consequently, P(T; >t) = pp’ O), 

We can always define such r.v.’s on some space of elementary outcomes and work with 
them. For example, we can simulate values of these r.v.’s. Certainly, these r.v.’s are arti- 
ficially constructed. However, if the model based on such r.v.’s allows us to compute the 
distributions of interest, this is just what is needed. We can view T,, T; as r.v.’s represent- 
ing some “modified causes” or as functions of the original r.v.’s 7), T2, but this is only an 
optional interpretation. The model we are building serves for computational purposes. 

So, let us consider model (2.2.1)-(2.2.3) replacing Tı, T> by T/, T. Proposition 1 asserts 
that in this new model the forces of decrement coincide with the original forces u}? (t). 
Moreover, let us recall that, by (2.1.8)-(2.1.10), f(t, j) is specified by uw (t),u® (t) in a 
unique way. Consequently, the analog of the density f(t, j) in our new model is the same as 
the original density f(r, j). (In particular, we use the same symbol f(t, j) in both models.) 

Then, of course, all other probabilities [such as P(J = j|T = t) or P(J = j)] and all 
expectations [such as E{T} or E{T |J = j}] will be the same in the new model as they 
were in the original model. 

To clarify the last assertion, consider one instance. 

Let T’ = min{T;,T;}. Since T/ and T; are independent, in accordance with (2.2.5), 


418 7. SURVIVAL DISTRIBUTIONS 


P(T' >t) = P(T! >t)P(T, >t) = p ,p'. (2.2.7) 


Substituting (2.2.6), we have 


t t t 
P(T'>t)=exp f- HM (s)as} exp l-J u? (as) =exp l-J (uP (s)+u® ())as} ; 
0 0 0 
(2.2.8) 
which coincides with ;p in (2.1.9)-(2.1.8). In particular, in view of (2.2.7), we can write 
OP 


payee (2.2.9) 


EXAMPLE 3. In a certain sense, in this example the beginning and end of Example 2 
are switched. Let us put aside for a while the computations in Example 2 and start from 
scratch. We restrict ourselves to two causes of death of (50): natural and causes resulting 
from an accident. Assume that somehow we have determined that the respective forces 
of decrement are u(t) = 1/(50 — t) and u® (t) = u = 0.01. Suppose, for example, that 
we want to compute 29pso9. Formally, we can substitute the above formulas for yl) (t) and 
ul?) (t) into (2.1.9). However, it is much easier to use (2.2.9) adding the now necessary 
index x = 50. Because ul”) (t) is the force of mortality of the distribution uniform on (0, 50], 
the “net probability” ; pe? corresponds to the distribution mentioned and, hence, equals 
1 —1/50. Similarly, ;p'?) = e™. Then, by (2.2.9), ;pso = +p!) ;p!?) = (1 —t/50)e™" for 
t < 50, and for 29 ps0, we will get the same answer as in Example 2. 

It makes sense to emphasize again that we did not assume in our calculations that the 
real causes acted independently. We just proceeded from an artificial model with (other) 
independent causes. However, this model was constructed in a way that led to a correct 
result. 


Some additional remarks. 


1. As mentioned before, T/,T; are artificially constructed, and the corresponding model 
is designed merely to simplify reasoning and calculations. In particular, T/ is not equal to 
Tj, i.e., to the termination time in the hypothetical situation when only the first cause is in 
effect. If we know the hazard rates u!)(t) and u® (t), we can find the joint distribution of 
(T,J) but we cannot find, proceeding only from the rates u!) (t) and u® (t), the distribu- 
tions of T, and 7). It may be shown that different joint distributions of (T, T2) can lead to 
the same joint distribution of (T,/). 

This reflects the real world situation. We may observe the moment of termination and 
determine the cause, but proceeding from this information, we cannot say what would have 
happened if the cause had not occurred. 

Assume, for example, that the termination of a contract may occur either if the client 
dies or moves to another area. The insurance organization has data on when terminations 
occurred and, in each case, the reason of termination. However, the organization cannot 
estimate from this data the distribution of lifetimes. 


2. A Multiple Decrement Model 419 


2. The reader may readily generalize the above model to the case of m > 2 causes. In this 
case, definition (2.2. a should be considered for j = 1,...,m. We may define in a similar 


manner r.v.’s T/,..., Tp, and replace (2.2.9) by 
m 

p= [I pi) 
j=l 


3. eae ae will ie down an pa as relation between the quantities qui ) (which 
equals qu )y and qi! ) (which equals iq’! )y, We will do so under the interpolation as- 


(J) 


sumption that all uy” (t) are constant in the interval (x,x +1). Then the same is true for 
llt) =X; u ) (t). The values of the functions at the endpoints of (x,x + 1) do not matter if 
we are interested in the probabilities ;g\/) and ,q/‘) which are specified by integrals of u’s. 
(See, for example, (2.2.6) and recall that +q” () =] —,p' (A) It may be shown that under the 
assumption we have made, 

qP o h pi? 

qx In Px ; 


(2.2.10) 
which together with qui )=1— pei ) gives the desired relation. 
A particular example is given in Exercise 50. 


> To prove (2.2.10), denote by cj, and cx the (constant) values of u ) (t) and u(t) on 


f 1 1 ; 1 
(x,x +1), respectively. By (2.1.10), qP L f(t, jdt =| Publ? (t)dt = cx | iPxdt. 
1 1 1 q c; 
Similarly, qx zj fr(t)d t= ff t Px (t)dt = dt =e, f ;pxdt. Hence, = — On the 
0 0 0 


x Cx 


hye 
other hand, by (2.2.6), for p'O), we have In(,p')) = -f ul) (s)ds = —C;jy. Similarly, 
0 
1 
ln(px) = -f ux(s)ds = —cx. Combining all of this, we come to (2.2.10). < 
0 


4. As was already mentioned, among the r.v.’s T/,...,7/, only one, T’ say, must be proper; 
that is, such that P(T/ > x) > 0 as x + œ. It may happen that for some T; it is not true, 
which means that this cause with positive probability will never act. See also the scheme at 


the end of Section 1.1. 


2.3 Survivorship group 


Consider a group of lą people of the same age a. All lifetimes are independent and 
correspond to the model of Section 2.1. Let , pi ) be the number of people who will die 
within the age interval |x,x +n] from cause j, and let L(x) be the number of people who 
will survive to age x. Similar to what was shown in Section 1.4, both random variables are 
binomial with probabilities of success P(x—a<T (a) < x—a+n, J=j) and P(T(a)>x—a), 


respectively. The latter probability is ,_, pg, and the former is x—apa' „gË ) (the individual 


®t the exposition of this section and the example below we follow the logic of Section 10.2 from [19]. 


420 7. SURVIVAL DISTRIBUTIONS 
lives until age x, and after that, as an x-year old, dies within n years from cause j). Hence, 


setting do) = E{,D} and J, = E{L£(x)}, similar to Section 2.1, we have 


di? =E{ D} Slys x—aPa': ng, (2.3.1) 
Le =lq+ x—aPa- (2.3.2) 
Using the second equality, we replace la: x-apa by Ly in the first. This leads to 
„d£? =l; „q£. 


In (2.3.2), we set a = x — 1, which implies 


Ly = l 1° 1Px-1 = l 1Px-1- (2.3.3) 


(We omit the prefix 1 in ıpx—1 and similar characteristics.) Next, we set n = 1 in (2.3.1), 
and omitting the prefix one as usual, we come to 


d\) = yqf. (2.3.4) 


EXAMPLE 1. Let /79 = 100 and the number of causes m = 3. The corresponding decre- 
ment probabilities are given below: 


x qh) ge? q? 


70 | 0.02 | 0.01 | 0.05 
71 | 0.03 | 0.02 | 0.06 
72 | 0.04 | 0.03 | 0.07 


Find l, and dv), In view of (2.1.3), qx = qs) + qo +P and py = 1 — qx. Using the 
recurrence formula (2.3.3) and formula (2.3.4), we have 


x | aP la? fa? | ae | pe |b=b-ipri [de =bal” | dP | dP 
70| 0.02 0.01] 0.05]0.08/0.92] 100 2 1 5 

71] 0.03 |0.02 [0.06 0.11 0.89] 92 2.76 1.84 5.52 
72|0.04| 0.03 |0.07]0.14/0.86| 81.88 3.2752 | 2.4564 | 5.7316 


2.4 Proof of Proposition 1 


The index x is everywhere omitted. Let f;(t) and F;(t) be the marginal density and d.f. of 
Tj, respectively. The event {J = 1} is equivalent to {T} < T2}. Then, following the general 
conditioning rule (0.7.3.9) and taking into account the independence of 7; and T2, we have 


3. Multiple Life Models 421 
:p') =P(T >t,J =1) = P(min{T,,h} >t, D > TM) =P(h>T >t) 
=| P(h>T] >t|TN% =s)filsjas= f P(T >s |T, =s) fi (s)ds 
t t 
= P(T) > s) fils)ds = (1—Fa(s)) fu(s)ds. 


Then 


fe) = Sd = £1-p) 0- AW) Aas=01- RO)AC. 


On the other hand, by (2.2.5), 
ip =P(T >t)= (1-A) -F (t)). 
Hence, by definition, 


Og- ROAD AO 
p URAURA UAA 


Again by definition, the last quantity is the marginal hazard rate of Tı. W 


3 MULTIPLE LIFE MODELS 


In this section, we study survival characteristics of a set (or cohort) of lives when this 
set is considered as a whole, as “one client”. For example, a joint family pension plan 
terminates only when both spouses die. Another example is a family life insurance plan 
with benefits payable at the time of the first death. In such situations, we refer to a set of 
lives as a “status”. 

If we know how to define the future lifetime of a status, we will talk about a survival 
status. Certainly, how long the status will exist depends on the type of the status (actually, 
on the type of the insurance contract under consideration). In any case, the future lifetime 
(or the time-until-failure, or the termination time) for a status is a function of the lifetimes 
of the persons (lives) involved. 

Consider a status of two lives, (x) and (y), with respective future lifetimes T(x) and 
T (y). In general, these r.v.’s are dependent. The following two types are probably the most 
important. 


1. The joint-life status. The status is intact until the first death occurs, and hence the 
lifetime of the status is min{T (x), T (y)}. Traditionally, the status itself is denoted by 
the symbol x : y, and the status lifetime by T(x : y) or T (xy). 


2. The last-survivor status. In this case, termination occurs upon the last death, and 
hence the lifetime of the status is max{7(x),7T(y)}. The status is denoted by x-y, 
and the status lifetime by T (xy) or T (xy). 


422 7. SURVIVAL DISTRIBUTIONS 


Certainly, these two models do not exhaust all possible situations. For example, a joint 
life insurance contract may involve benefits payable upon each death, and the amount of 
benefits paid may depend on which death occurs first. We will consider such examples 
later. 

In any case, to describe the situation, we should know the joint distribution of the r.v.’s 
T(x) and T (y). 


3.1 The joint distribution 
It may be characterized either by the joint d.f. 
Frœro u,v) =P(T (x) < u, T (y) < v), 
or by the joint survival function 
ST(x)T(y) (U,V) = P(T (x) >u, T (y) >v). 


To make our exposition simpler, we will sometimes omit the index T (x)T (y), and since 
x and y are fixed, we write 7; and T) instead of T (x) and T (y). Later we will come back to 
actuarial notation. 

All calculations below are not specific and are based on general probability theory for- 
mulas from Section 0.1.3.2. First of all, the marginal d.f. 


Fr, (u) =P(T < u) = P(T, Su, h< oo) = Fr,7,(u,~) = F (u,%), 68.1.1) 


if we omit the index 7\7>. Similarly, since the distributions of T’s are continuous, the 
marginal survival function 


sp, (u) =P(T > u) = P(T, > u, Dh > 0) = P(T, > u, Tr > 0) =s7,7,(u,0). 
Analogously, 
Frv) = P(TIh < v) = Fr,7,(~,v), and sn (v) = P(T >v) = srn (0,v). 


Next, we consider a connection between the joint d.f. and the survival function. In the 
one-dimensional case, for any r.v. T, we write P(T < x) = 1— P(T >x). The counterpart 
of this identity in the two-dimensional case is the identity 


P(T; <u, h <v) =1-P(T, >u)—P(Th >v)+P(T >u,Th >v). 


(Since when subtracting P(T, > u) and P(T) > v), we subtract the probability of the set 
{T >u, D > v} twice.) Thus, 


Frn (u,v) = 1- sr (u) — sn (v) + srn (u,v). (3.1.2) 
In the same way, we get that 


snn (u,v) = 1 — Fr (u) — Fr (v) + Fr, (u,v). (3.1.3) 


3. Multiple Life Models 423 
EXAMPLE 1. Let each r.v. T; (i = 1,2) take on values from (0, 1], and let the joint density 


fr,7,(u,v) =Cy[1—k(u—v)"] for O<u<1,0<v<1,and =0 otherwise. (3.1.4) 


Here k is a parameter taking values from [0,1], and the constant Cy = 6/(6—k). Such a 
constant is chosen in order that the total integral of the density equals one. The reader can 
readily verify that this is true by direct integration. 

The unit intervals in (3.1.4) are chosen for simplicity. Since we interpret T’s as lifetimes, 
the unit of time is not one year but an appropriate number consistent with the ages x and y. 

The parameter k characterizes the dependency between the r.v.’s T) and 7>. If k = 0, then 
fnn (u,v) = 1 for all u € [0,1] and v € [0,1], and T; and T, are independent and uniformly 
distributed on [0, 1]. (See Example 0.1.3.2-2 which differs from what we consider now only 
by scale.) 

If k > 0, then the r.v.’s are dependent and, in a certain sense, the larger k, the stronger this 
dependence is. In particular, note that density (3.1.4) reaches its maximum in the line u = v, 
which means that the r.v.’s T; and 7> “have tendency to be close to each other rather than 
to differ significantly”. We may interpret this as if the persons we consider are connected 
with each other; for instance, they have common living conditions. 

We consider k < 1 because otherwise the density would be negative for some u,v. 

In accordance with (0.1.3.11), the d.f. 


Pause i | (HOC here) f i (1—k(t—s)?)dtds 
23D, 


2: 2 
=C |w „(2% a) > )| for 0<w<1, 0<v<1, (8.1.5) 


3 


which may be obtained by straightforward integration. Since P(T) < 1) = 1, 


2 2 
u (H E) for 0<u<1. (3.1.6) 


Fr, (u) = Frin, (u,1) = Ck 


Since the distribution is symmetric, the marginal distribution of T) is the same: 


vv? +1) v? 
3 2 


Frslv) = Ce» K( Jl for 0<v<1. 


Inserting all of this into (3.1.3), after some algebra, we get that the joint survival function 


241 247 22 24,2 2.2 
snn (u, v)=1—Cy je uv (2 + A +1) u = wae ) | a 


(3.1.7) 
One can also compute 


aen = f de 


Here, straightforward calculations lead to a simple answer, namely, E{T,} = 1/2. Thus, 
the answer does not depend on k. 

We will revisit this example several times later, using the formulas above for various 
illustrations. In Exercise 53, the reader is invited to carry out the calculations in more 
detail. 


424 7. SURVIVAL DISTRIBUTIONS 


3.2 The lifetime of statuses 


Now we consider the lifetime for the joint-life and the last-survivor status, which corre- 
sponds to the r.v.’s T = min{T;, 7} and T = max{T\, 7}, respectively. 
In the first case, it is more convenient to deal with the joint survival function since 


P(T >t) =P(min{T,%}>t)=P(T) >t, Dh >t) =srn (t,t). (3.2.1) 


From (3.2.1) it follows that 


If T; and T, are independent, then 
P(T >t) =P(T, >t)P(T >t) =s7,(t)sn(t). 


(3.2.2) 


In the case of the last-survivor status, it is easier to work with the joint d.f., writing 
P(T < t) = P(max{T;,T>} < t) = P(T <th < t) = Frin (t,t). (3.2.3) 


From (3.2.3) we readily get that 


If T; and T, are independent, then 
P(T <t)=P(T) < t)P(T <t) = Fr (t)Fr (t). 


(3.2.4) 


Now, we will write the same in actuarial notation. Let ;p,-y stand for P(T (x : y) >t) = 
P(T >t), and ;pey for P(T (xy) >t) = P(T >t). Following the same logic in notation as 
above, let us set ;¢x:y = 1— Puy, 14xy = 1 — 1pry. Note also that T; = T(x), and D) = T (y). 
Hence, Fr, (t) = tqx, Fn (t) = Gy, ST, (t) = Px, and sr, (t) = ;py. Then (3.2.1)-(3.2.4) may 
be rewritten as 


tPxy = ST (t,t), and in the independence case, ,Py:y = +Px* tPy; (3.2.5) 


dey = Frin (t,t), and in the independence case, „qzy = 14x° tqy- (3.2.6) 


These formulas must be understood correctly. The factor ;p, in (3.2.5) concerns the first 
life, and the term ;py concerns the second. These two lives may have different distributions, 
SO ;Px May not equal ;p, even if x = y. Thus, as a matter of fact, the quantities ;p, and ypy 
should be supplied by additional indices which are omitted for the sake of simplicity. 

Unlike (3.2.5), in the corresponding formula (3.2.2), the distributions of the two lives 
are denoted by different symbols, so this formula explicitly reflects the circumstance just 
mentioned. The same concerns the factors ;q, and ;qy in (3.2.6). 

We can also connect characteristics for the two statuses by making use of the connection 
between the joint distribution and corresponding survival functions. Setting u = v =t in 
(3.1.3), we obtain from (3.2.5)-(3.2.6) that 


3. Multiple Life Models 425 


(xy = T= tPxy = 1 — 817) (t,t) =i> [1 — Fr (t) — Fr (t) +Frn(t,t)] 
= Fr, (t) + Fr (t) — Frin (t,t) = 19x + tly — t4ry- 


We rewrite this in the following nice form: 


taxy + t4xy = 19x + ty- | (3.2.7) 


Replacing each q above by 1 — p and canceling all ones, we come to the equation 


tPxy + tPxy = tPx + tPy- (3.2.8) 


For a straight proof of (3.2.7) and (3.2.8) see Exercise 64. 
EXAMPLE 1. Let us revisit Example 3.1-1. In view of (3.1.5) and (3.2.6), 


5 wt r ne ft 
9ey = Fnn (t,t) = Cp |t —k = Ck naks i (3.2.9) 


3 2 


Since ;qyy is the d.f. of the r.v. T = T (x7y), its density is 


d 2t? 
Sr (=) (t) = Ji Quy = Ck (2+ -15 ) : 


Consequently, we can compute 


= i 26 2” Aa 6 oe” 
ETET) | tfr (a= f tCk (5) dt=Cx G k=) =] G kx). 


For instance, for k = 0, when the r.v.’s T, and T, are independent, E{T (x7 y)} = 5, which 
certainly can be obtained directly. Indeed, in the independence case, T; and T) are uniform 
on (0, 1], and in view of (3.2.4), Fr(t)=Fr, (t)Fp, (t)=t-t=t?. Then E{T (Xz y)}= Ja td? =ż. 


In the case of dependence, the situation is more complicated. For example, for k = 1, we 


have E{T (x-y)} = $ (4—1. =) = 5, which is a bit larger than 2/3. 


Consider the joint-life status and, accordingly, T(x: y). In this case, by (3.1.7) and 
(3.2.5), 


2(t? +1 P E i 
EE a K( ( ) t H )] 


3 3 2 
k 2k k 
=C,J2(1 t— (1k) pega 


For example, for k = 1, 


4 2, 1 
BB men? 


= 6 4 
tOx:y = 5 $: — -t + -t 


426 7. SURVIVAL DISTRIBUTIONS 


and the density 
d 6 [4 2a 24 
frxyy(t) = gr t? = 5 [5-20 T 3! | ; 
This leads to 
l 1 6f4 2 1 
E{T(x:y)}= | tfroey(t)dt = | t W+ 2| dt = 
Teh = [thre (ar= [08 [2-24 53] ar 


In Exercise 54, we compare it with the independence case. 


EXAMPLE 2. Consider a married couple aged 50 and 60. Assume that their future 
lifetimes are independent, the mortality for the first life follows De Moivre’s law with 
œ = 100, and for the second life, the force of mortality of the future life is close to u = 0.05. 

(a) Find 20q50:60.This is the probability that at least one of spouses will die within 20 
years. De Moivre’s law corresponds to a uniform distribution. In Exercise 7b, we establish 
the simple fact that in this case, the conditional distributions are also uniform. Hence, 
the distribution of T, = T (50) is uniform on [0,50]. The distribution of T) = T (60) is 
exponential with E{7T)} = i = 20. By (3.2.5), 


30 
et” — x 0.22. 
D =0.6 e! x0 


20P50:60 = 20P50° 20P60 = P(T > 20)P(T > 20) = 


Then 2050:60 = 1 — 20P50:60 © 0.78. 
(b) Find 29 ps5.¢9, the probability that at least one spouse will survive 20 years. Here we 
use (3.2.6), which implies that 


20950:60 = 20950 ° 20960 = P(T; < 20)P (T< < 20) = — e70) =0.4. (1 nea) z 0.25. 


1 
50 | 


Then 29 P5969 = | — 2045969 © 0.75. 


(c) Find the probability that the sixty year old person will die first. It is more convenient 
to find the complementary probability P(T(60) > T(50)). Using the formula for total 
probability in its integral form [see (0.7.3.9)], we have 


1 
P(T >T) =f PR >T|T =t)fr(t) t)dt = a PR > NIT =t) at. 
0 


Because T; and D are independent, P(T) > Ti |T; =t) = P(T > t). Consequently, 


50 50 
PT >T)= 5 p P> t)dt =5l edt = 35201 - @ 09990) ~ 0.37. 


Thus, P(7> < T1) = 1—0.37 = 0.63. 


EXAMPLE 3. In the case of independent lives, compute 4|2960:63 in terms of character- 
istics py. We have 4)2460:63 = 4960:63 — 2460:63. In view of (3.2.6), 4q60:63 = 1 — 4760.63 = 
1 — 4P60° 4P63, and 2460:63 = 1 — 2P60:63 = 1 — 2P60 ` 2P63- 

Next we use that px = PxPx+1°---* Px+t—1 [See (1.2.8)]. First, we compute 260 = peopel, 
and 263 = po3Pes. After that, we write 4P60 = 2P60P62P63, and 4P63 = 2P63P65P66- 


3. Multiple Life Models 427 


EXAMPLE 4. In the situation of Example 1, let the unit of time be 50 years and k = 1. 
(a) Compute the probability that both persons will not survive 30 years, and the first 
person will die first. The probability that both lives will not survive 30 years is ¡qzy for 


2 = 0.40608 if t = 0.6. 


t4 
t =3/5. By (3.2.9), qzy = Ck t-ko = 1.2 | 0.36 — 


Since the joint distribution of T (x) and T (y) is symmetric, it is equally likely either the first 
or the second person will die first. So, the answer is 5 Oxy = 0.20304. 


(b) Find the probability that the first person will not survive 40 years, the second 30 years, 
and the second person will die first. We adopt the notation of Example 3.1-1 and denote by 
u and v possible values of the first and the second lifetimes, respectively. The event whose 
probability we are computing corresponds to the set depicted in Fig.8. 


aV In this figure, we see that the probability mentioned is 
equal to P(T, < 0.8, Ty < 0.6) — P(T, < 0.6, Th < 0.6, Ty > 

1 Tı). As above, in view of symmetry, the latter probabil- 
ity is equal to 5P(T < 0.6, Ty < 0.6) = 0.20304, as we 

06) ee Ze have already computed. To compute the former probability 


P(T; < 0.8, To < 0.6), we should set in (3.1.5) k= 1,u=0.8, 
and v = 0.6, which gives F7,7,(0.8,0.6) = 0.52224. So, the 
u answer is 0.52224 — 0.20304 = 0.3192. 


pa R 
0 0.6 0.8 1 
EXAMPLE 5 ([153, N32]’). John, age 40, and Mary, age 


FIGURE 8. 50, are independent lives following the same mortality as 
follows: 
Age (x) 40 50 60 
1ogx 0.039 0.085 0.192. 
Calculate the probability that John and Mary both live at least 10 years and then both die 
during the following 10 years. 
Following the same logic as above, we have 


P(T >10,T < 20) = P(T < 20|T > 10)P(T > 10) 
= P(T (40 : 50) < 20|T(40 : 50) > 10)P(T (40 : 50) > 10) 
= P(T (50: 60) < 10)P(T (40 : 50) > 10) 
= 10950° 10960: (1 — 10940) - (1 — 10950) © 0.01435. 


Actually, it is enough to write the last line which explicitly shows the logic of calculations. 


Next we discuss forces of mortality. In Section 1.1, we have already seen that if we con- 
sider the minimum of two independent r.v.’s, the corresponding hazard rates are summed. 
Consequently, for a joint-life status in the independency case, we can write that the force 
of mortality of min{T (x), T (y)} is 


pey(t) = uP (0) MY? (0) = WO (tt) +H (9 +2). (3.2.10) 


Reprinted with permission of the Casualty Actuarial Society. 


428 7. SURVIVAL DISTRIBUTIONS 


We used above (1.2.3) and denoted by ul) (t) the force of mortality for the ith life. See also 
the remark right after (3.2.6). 


EXAMPLE 6. Let two lifetimes be independent, uP (t) = 0.02 and u® (t) = 0.03. Then 
for T(x : y) = min{T (x), T (y)}, by (1.1.16), we have ux:y(t) = 0.05. and hence T(x : y) is 
an exponential r.v. with E{T (x : y)} = 20. This is a well known fact in Probability Theory: 
the minimum of independent exponential r.v.’s is also exponential. 

Certainly, we can compute any probabilities connected with T(x : y). For example, by 
using the result of Example 0.7.3-1, we get that the probability that the first person will die 


first is (uo) / (uP +u) =0.4. 


For the last-survivor status, formulas are not as nice as (3.2.10). Nevertheless, prob- 
abilities are computable. For brevity, let us set again T, = T(x), Ty = T(y), and T = 
max{T7,,7)}. Assume the lives to be independent. Then, by (3.2.4), the df. of T is 
F(t) = Fr, (t)Fp,(t), and the density 


Sr(t) = Fpl) = Fp) F(t) + Fr) F(t) = fr Fr (t) +Fn (t) fn (t). 


If we come back to actuarial notation and use (1.2.4), we can write fr) (t) = ux(t):Px = 
u(x+t):px and fry (t) = ty(t) spy = u+ t) py. Hence, 


fre) = Ux(t) +Px* ty + my(t) Py tax 
Note also that P(T >t) = 1 — qzy = 1 — tqx tqy. Eventually, 


nm _ Halt) iPx: dy HCE) Py ` de (3.2.11) 


>t) 1 — tqx: tly 


Uxy (t) = 


By virtue of (1.2.3), we can replace ux(t) and uy(t) by u(x +t) and u(y +t), respectively. 


EXAMPLE 7. Estimate p7979(2) proceeding from the data from Table 1.5.1-1, and as- 
suming the lifetimes are independent. By (1.4.4), 2p70 = (l72/l70) ~ (71669/75335) ~ 
0.95134, and hence 2979 ~ 0.048663. In Table 1.5.1-1, the two types of estimates for 
u70(2) = (72) are 0.027663 and 0.028053, corresponding to the linear and exponen- 
tial interpolation, respectively. These numbers are close, and we may take their average 
= 0.027858. Then, (3.2.11) leads to 


__ 2-0.027858 -0.95134 - 0.048663 
Hon?) © 1 — (0.048663)2 


= 0.002585. 


3.3 A model of dependency: conditional independence 


Next, we consider a simple model of dependency between two lives involved in one 
insurance contract. A survey of some dependency structures used in Actuarial Modeling 
and references may be found, e.g., in the book [33] by M. Denuit, J. Dhaene, M. Goovaerts, 
and P. Kaas. 


3. Multiple Life Models 429 


3.3.1 A definition and the first example 


Assume that the durations of two lives are affected by some common factor. For exam- 
ple, for a married couple it may be identical living conditions. In this case, the lifetimes 
are dependent. Suppose, however, that once the influence of the factor mentioned has been 
specified (for example, the conditions under which the spouses live are known), the life- 
times may be considered independent. 

To model this situation, we consider the r.v.’s T (x) and T (y) as above, and a r.v. or ar.vec. 
C which is identified with the common factor on which T(x) and T (y) depend. We assume 
that given C, the r.v.’s T(x) and T(y) are independent, which amounts to the following 
assumption on the conditional distributions. For any sets Bı and B2 from the real line, 


P(T (x) € Bi, T) € B216) = P(T (x) € Bi |6) P(T) € B216). 6.3.1) 


Random variables having this property are called conditionally independent. To find the 
joint distribution of T (x) and T (y), we should take the expectation of both sides of (3.3.1) 
with respect to C. By virtue of the general formula of total expectation [see, e.g., (0.7.2.1)], 
this leads to the relation 


P(T (x) € Bi, T(y) € B2) = E{P(T (x) € Bi|6)P(T(») € Bo|6)}. (3.3.2) 


EXAMPLE 1. Let T(x) and T (y) be exponentially distributed with a common parameter 
C which we assume to be random. Suppose that ¢ is uniformly distributed on an interval 
[a,b], and given ¢, the r.v.’s T(x) and T (y) are independent. (In other words, if we know 
that C took a particular value u, we consider T(x) and T (y) independent and exponential 
with the parameter u.) Let fg (qu) be the density of ¢. Then, by the general formula (0.7.3.9), 
the joint survival function 


Sr(jry)(¥,v) = P(T (x) > u, Ty) > v) = frre >u,T(y) > v|Ẹ = u) fg(u)du 
b 1 j i 
z) P(T (x) >u)P(T(y) >v) du =| ee dy 
1 


b—a 
b 
f e HUY) du = 


—a(u+y) 
Considering the joint-lifetime status, for T(x : y) = min{T (x), T (y)}, we have 


yy e (uty) 


(b—a)(u+v) 


= pa 


ew at _ eT ht 


tPxy = P(T (x: y) >t) = P(T (x) >t, TO) >t) = 


For example, let a = 0.02 and b = 0.08, so the conditional expected values, E{T (x) |G = u}, 
varies from : = 12.5 to 7 = 50. If ¢ had assumed just one value equal to its mean 0.05, 
then we would have had 


tPxy = P(T (x) > t)P(T (y) >t) = exp{—0.05t }exp{—0.05t} = exp{—0.1t}. (8.3.4) 


430 7. SURVIVAL DISTRIBUTIONS 


However, in case (3.3.3), we have 


1 —0.04t _ „—0.16 25 / 0.04 „0.16 
Di Fe oe (e te ") err te DE (3.3.5) 


For large f, the last probability becomes essentially larger than (3.3.4). The reader can see 
this in the table below where we give values of ;px:y for both cases. 


| 20 | 30 | 40 | 50 
| the independence case | 0.135 | 0.049 | 0.018 | 0.007 
| the dependency case | 0.170 | 0.081 | 0.041 | 0.022 


3.3.2 The common shock model 


This model is another particular example of conditional independence. It was first dis- 
cussed in [89], [90] together with other interesting bivariate distributions. 

Assume that the lifetimes of two persons are independent unless a common shock causes 
the death of both. For example, it may be a lethal traffic accident in which two spouses 
are involved. Following traditional notation, denote by Z the moment of the shock, and by 
T* (x), T* (y) the durations of the lives in the absence of the shock. The three r.v.’s defined 
are assumed to be mutually independent. 

Clearly, the lifetime of the first person is the r.v. T(x) = min{7*(x),Z}, and the lifetime 
of the second is T(y) = min{T*(y),Z}. Given Z, the r.v. T(x) and T (y) are independent. 
The joint survival function 


ST(x)T(y) (u,v) = P(T (x) >u, T) >v) = P(min{T*(x),Z} > u, min{T*(y),Z} > v) 
= P(T*(x) > u, Z > u, T* (y) > v, Z > v) = P(T* (x) > u, T* (y) > v, Z > max{u,v}) 
= P(T* (x) > u)P(T* (y) > v)P(Z > max{u,v}), 


in view of the mutual independence of T* (x), T* (y), and Z. Thus, 
ST(x)T(y) (U, V) = ST) (U) ST) (v): sz(max{u,v}). (3.3.6) 
Then for the marginal survival functions, we have 
ST (t) = srr) (t,0) = sr t) sr) (0) -sz(max{t,0}) = srt) sz). (3.3.7) 


Similarly, 
ST(y) (t) = ST*(y) (t) : sz(t). (3.3.8) 
Note right away that from (3.3.7) and (1.1.8) it follows that the hazard rate for T (x) is 


uro) = ml) = -2 mirol) s20) = -Slr t) -Z im(sz(0)) = 
= uï (t) +uz(t), (3.3.9) 


3. Multiple Life Models 431 


where, for brevity, y(t) denotes the force of mortality for T*(x), and z(t) is the hazard 
rate for Z. Similarly, 


Hro) (t) = 43 (t) +uz(t), (3.3.10) 


where už (t) stands for the force of mortality for T*(y). 

The main point here is that the r.v.’s T* (x) and T* (y) are not observable. We can observe 
when people die and whether it happens due to a common shock, but we do not know 
how long a person would have lived if the shock had not happened. In other words, we 
know (more precisely, may know) the marginal distributions of T(x) and T(y), and the 
distribution of Z, but we do not know the distributions of T*(x) and T*(y). However in this 
particular model, we can find these distributions solving (3.3.7) and (3.3.8) with respect to 
ST*(x) (t) and sr» y) (t). 


EXAMPLE 1. Assume that Z is exponential with parameter À. Often, the term common- 
shock-model is applied to this particular case. From (3.3.7) it follows that ; px = sr) (t) = 


ST*(x) (t)e-™, and 


Spo (t) =e“ sro (t). (3.3.11) 


In particular, T*(x) and T(x) may be exponential (that is, s7-(,)(t) and eM STi) (t) are 
exponential functions) only simultaneously. 

If T(x) and T*(x) are exponential and u and pj are the corresponding (constant) hazard 
rates, then (3.3.11) may be rewritten as exp{—pit} = exp{Ar}exp{—pit}. Hence, 


u =u À. 


Consider now the joint-life status and, accordingly, the r.v. T = T (x : y). From (3.3.6), 
we have 


sr(t) = P(T >t) =srœ@ro ltt) = srw (t) sreo (t) sz(t). (8.3.12) 


Similar to (3.3.9), 
ur (t) = Hilt) +h (t) +az(0). (3.3.13) 


We may obtain sr+(x) (t) from (3.3.7) and sy: (y) (¢) from (3.3.8). Inserting the correspond- 
ing expressions into (3.3.13), we readily obtain 


sr(t) = Sry (t) -sro (t)/sz(t), (3.3.14) 


or in actuarial notation 
tPx:y = tPx* tPy/Sz(t). (3.3.15) 


As previously mentioned [see the remark right after (3.2.6)], the factor ;p, corresponds 
to the first life, while ;p, corresponds to the second, so these factors may not coincide even 
ifx=y. 


EXAMPLE 2. Let two future lifetimes with the same initial age of 70 belong to differ- 
ent groups (for example, specified by gender), and life tables give estimates /79 = 69000 
and /72 = 64000 for the first group, and Do = 65000 and Īz = 61000 for the second. Find 


432 7. SURVIVAL DISTRIBUTIONS 


2P70:70 if the distribution of Z is exponential with A = 0.05. To avoid ambiguities, denote 
by ;Py the characteristic corresponding to the second life. Then, by (3.3.15), 


2P70:70 = [2P70° 2P70/Sr(2)| = [2P70- 2P10/e-*?] = e* (In9/Iq0) (172/t0) x 0.962, 


as is easy to compute. See also Exercise 62. 


4 EXERCISES 


Section 1 


1. Show by differentiation that (1.1.10) is a solution to (1.1.7). 


2. Suppose that for some population, for people who survived 30 years, the probability of dying 
before 40 years is negligible. How does the survival function look in this case? 


3. Find the force of mortality function for the case of Example 1.1-1. Write s(120). 
4. Can the function |cosx|e~* be a survival function? 


5. How can the mortality force be changed for the share of newborns who will survive the first 
year will become 10% larger? Find the answer for the case of the first k years. When does 
the problem have a solution? Is a solution unique? 


6. Let u(x) = (1+x)~!. Find the survival function. What is noteworthy? 
7. Let a life time X be uniform on [0, œ] (De Moivre’s law). 

(a) Show that the force of mortality is u(x) = 1/(@ — x). 

(b) Prove that T (x) is uniformly distributed on [0, œ — x]. 


8. Determine the distribution associated with u(x) = 1/(@—x)% for x < œ and parameters œ 
and a. For what values of & does the problem make sense? 


9. Determine the distribution associated with u(x) = &/(@ — x) for x < @ and parameters œ and 
a. For what values of & does the problem make sense? 


10. Which function, if any, can serve as a force of mortality at least theoretically: 
(a) u(x) = xe~*, (b) u(x) = 0 if x < 1, and u(x) = x7 !/? otherwise. 


11. In a country, for a typical person of age 50, the probability 39 p59 = 0.5. 


(a) Find the same probability for a country where the force of mortality for people of age 
fifty and older is three times higher. 
(b) Find the same probability for a country where the force of mortality for people of age 
fifty and older is 0.01 less. 
12. Specify t, u, and x for which tludx = qx, tux = 20P30 — 45 P30- 
13. Prove (1.2.8) (a) arguing heuristically, and (b) rigorously, applying induction. 
14. Let X be exponential and E{X} =75. Find 39)10950. 


15. 
16. 


17. 


18. 
19; 
20. 


21. 


22. 


23. 


24. 
25. 


26. 
27. 


28. 


29. 


30. 


4. Exercises 433 


For s(x) = (1 — x/100)!/?, find the median (that is, 0.5-quantile), 10/10475; and ers. 


(a) In a country, the survival function for women is closely approximated by s(x) = (1 — 
x/100)!/2, while for men, sm(x) = (1 —x/90)!/2. We assume that the probability of 
the birth of a boy is 1/2. (i) Considering just the ratio of mean values, estimate the 
average proportion between men and women of age 50. Comment on a way of getting 
a more precise solution. (ii) Find 39);9¢50 for a person aged 50 taken at random. 


(b 


ma 


In general, let a population consist of two groups with conditional survival functions 
t pi) and ; p and the probabilities (weights) that a newborn belongs to these groups 
be w and w2, respectively. Write a general formula for ; px for a representative of the 


population, and the weights w (x) and w2 (x) of the groups among people of age x. 


The lifetimes of 100 people are independent with the same survival function s(x) = (1 — 
x/100)'/?. Find the distribution of £(60), as well as its mean and variance. 


Following the same logic as in Example 1.2-2, prove that „Px = mpx’ n—mPx+m form <n. 
If 29 p50 = 0.8, and 1555 = 0.85, what is 5p5ọ ? 


Which is larger: the probability to survive to age 70 for a 60-year-old person or for a 65- 
year-old person (provided that they are chosen at random from a homogeneous population)? 
State the question in general, using letters rather than numbers. 


(a) For young people, causes of death are related mostly to accidents. Proceeding from 
this, explain why the assumption of the approximate constancy of u(x) may look rea- 
sonable for x’s between 30 and 40 years. 


(b) For what x’s do we need to know values of u(x) to compute 10p30? Which formula 
shows it explicitly? Relate this question to part (a). 
(c) Why is the assumption that u(x) is constant not unreasonable in Example 1.2-4 but 
rather artificial in Example 1.2-6? 
Assume that the lifetime of 30% of newborns has a constant mortality rate of 1/50 year~!, 
while for 70%, the rate is 1/80 year |. In other words, the distribution of X is a mixture of 
exponential distributions. Is it true for T(x)? Find the distribution of T (20). Interpret your 
results from a common sense point of view. 


Estimate esp in the case of s(x) = exp{—(x/70)?}. (Advice: Use (1.2.10), and observe that 
the resulting integrals are relevant to the standard normal distribution function.) 


Let u(x) = 1/[2(100 — x)] for x < 100. Find 7s. 


Prove that 
ex = px(1 + x41). 


By using the results of Exercises 13 and 25, find 2p¢o given e60, €61, and e62. 
Cx€xt 1° ++ Cxtn—1 
(1+ ex41)(1 +ex42): (1 + ex4n) 
Show that if X is exponential with E{X} = r, then K(x) has the geometric distribution 


(0.3.1.9) with parameter p = 1 — e™, where the (constant) force of mortality u = 1/r. (Ad- 
vice: First, recall that T (x) is distributed as X (why?). Secondly, look up Example 1.3-1.) 


Show that „px = 


Find Var{T (10)} in the situation of Example 1.2-4. 


Show that ex < ey for any distribution of T. (Hint: The problem is very simple.) 


434 


31. 


32. 


33. 


34. 
35. 


36. 


37. 


38. 


39. 


7. SURVIVAL DISTRIBUTIONS 


3000 |—¢ —?— The data | 

2 500 W s, = = = The trend 

e. 
2 000 bE J = | 
1500 ha = >. R | 
1 000 he ~* | 
ag | 
500 | 
| 
o , i : l 
1 2 3 4 5 6 7 8 9 10 
FIGURE 9. The broken line corresponds to a trend line. 


In Example 1.3-1, show that ex ~ r as r —> œ. Can we say in general that if ex is large, then 
ex xwe? 


Show that the second term in (1.6.4) corresponds to the survival function of the same type as 
in Example 1.1-4. 


By using Table 1.5.1-1, estimate: 40p40, u(20), the probability for a person of an age of 40 
to die between 50 and 60. Compare 2/qo and 2/420. 


Give a common sense explanation of the facts mentioned in Example 1.5.1-1e. 


Graph values d, from Table 1.5.1-1 for x = 90,...,99. Do not use the data for ages 100+ 
since they concern more than one year. Add a trend line. The corresponding graph obtained 
by Excel is shown in Fig.9. We see that the graph is close to linear. Explain that in this 
case, we can approximate the density of T(90) by a linear decreasing function. Extrapolate 
(extend) the graph as a linear function, and find the age a at which the corresponding line 
crosses the x-axis. Show that, if we do not take into account people who survive the age of 
a, then the density fr(90)(t) may be closely approximated by the linear function 2(a — t) / a. 
What percent of people were not taken into account? Show that if we do take into account 
people who survive a, we should approximate fr(90)(t) by another function for x > 100. For 
example, for a rough approximation, it may be a linear function with another slope than the 
one for 90 < x < 99. 


Graph values l, from Table 1.5.1-1 for x = 75, ...,99. Do not use the data for ages 100+ since 
they concern more than one year. Add a trend line. Discuss the possibility of approximating 
the distribution of T (75) by a uniform distribution. 


Let s(60) = 0.830, s(61) = 0.825, s(62) = 0.820. Using the linear and exponential interpo- 
lations described in Section 1.5.2, find s(60.5) and s(61.3). Using both methods, estimate 
u(60.5) and the probability that a person whose age is 60 years 3 months will live at least 
one year more. 


Show that (a) if the distribution of X is uniform on an interval [0,0], then the linear inter- 
polation (1.5.2) leads to precise values of s(x +1); (b) the exponential interpolation leads to 
precise values of s(x +t) if X has an exponential distribution. Are these distributions the only 
distributions with such properties? 


In a country, for a person chosen at random, the survival probabilities s(20) ~ 0.97, 
s(50) ~ 0.91, and s(60) ~ 0.65. Find the parameters of the Makeham law (1.6.3). (Advice: 
Compute first the logarithms of s(-).) 


40. 


41. 


42. 


43. 


44. 
45. 
46. 


47. 


48. 


49. 


50. 


51. 


52. 


53. 


4. Exercises 435 


Section 2* 


In the double decrement scheme, uP (t) = 1? and uP (t)=2t+1. For which t’s, given 


that death occurred at time f, is the probability that it occurred from cause 1 larger than the 
corresponding probability for cause 2? 


Let q$} = 0.05, qs) =0.06, 4$} = 0.07, qÊ =0.01,4® =0.02, and g&) = 0.03. Find 34$} 
and PJ =1|2<T <3). 

Give a heuristic proof of Proposition 1 writing T ~ t instead of t < T(x) < t +dt, and writing 
the r.-h.s. of (2.1.7) for j= 1 as P(T xt, J=1|T >t) =P(min{l, Ih} st, | <T|T >t). 


In the situation of Example 2.3-1, find 2/470, Var{ py) } (where pe) — = De) ), and P(DS) = = 


3) and compare the last result with its Poisson approximation. 


What data should be added to Table 1.5.1-1 in order that we will be able to estimate ul! 9 


Show that if the functions u ) (t) are constant, then the r.v.’s T(x) and J are independent. 

In the case us! ) (t) = j/20 for j= 1,2, find fr;(t, 7), the marginal distribution of T, and 
P(J= j). 

In the case u(t) = jt/~'/20 for j = 1,2, find fr,(t, j), the marginal distribution of T, and 
P(J = j). 


(1) 


The force of eee for accidental death is ux (t) = ua z; for death from other causes, it is 


u(t t)= X30=1) IT 7 y (for all ¢ for which it makes sense). Find 10px and the probability that death 
will be a Aa of an accident. 


There are three causes for the termination = an insurance on (x). The corresponding forces 
of decrement are us? (t) = aa w(t t)= ee IGO and MS (t t)= 0.04e2", Find 10Px> and es- 
timate the probability that death will be a result of the first cause. Compare also the mortality 


rates above with those in Section 1.6. 


(i) 


Probabilities qx `, i = 1,2, are given in the following table. 


t 0 1 2 


| ai 0.02 | 0.03 | 0.04 


| 42 0.03 | 0.04 | 0.05 


Provide an Excel worksheet for estimating q® and ; py- 


Section 3* 


As was stated, for example, in (0.1.3.12), if F (u,v) is a joint d.f., the density f(u,v) = 
0°F (u,v) 
dudv 


Let h(t) be a decreasing function such that (0) = 1 and h(cc) = 0. For example, A(t) = e”, 
Any such a function may be the tail (survival) function for a positive r.v. T. That is, there 
exists a r.v. T such that P(T > t) = h(t). Is it true that the function s(u,v) = h(u +v) may 
(u+v)? 


. Write f (u,v) in terms of the survival function s(u,v). 


be the joint survival function for a r.vec. (Tı, T2) ? For example, is the function e~ 
survival function? 


a 


Carry out all calculations in Example 3.1-1. 


436 


54. 
55. 


56. 


Of 


58. 


59. 


60. 
6l. 


62. 


63. 


64. 


7. SURVIVAL DISTRIBUTIONS 


In Example 3.2-1, compare E{T (x: y)} for k = 1 and k = 0. 


For two independent lives, 30p40:50 = 0.9. Assume that life conditions for both lives changed 
in a way that the new forces of mortality for each life decreased by 10%. Find the new value 
for the probability mentioned. 


For two independent lives with the future lifetimes T and 7, the characteristics ;p = P(T >t), 
and ;p = P(T > t) are given. Find the probabilities that (a) both will survive n years; (b) 
exactly one will survive n years; (c) no one will survive n years; (d) at least one will survive 
n years; (e) both will die in the nth year; (f) no one will die in the nth year; (g) exactly one 
will die in the nth year; (h) one will die in the nth year, and one in the next year. 


For a husband and wife aged 50 and 40, respectively, the survival functions are the same as 
in Exercise 16. Under the independence assumption, find 

20P50:40; 50P50:40, 

20P50:40> 50P 50:40: 

Consider a husband and a wife aged 50 and 40, respectively. Assume that the lives are 
independent, for the husband the mortality is 10% higher than that for the average distribution 
from Table 1.5.1-1, and for the wife 10% lower. Find 20p50:40 and 20pspqp- 


Show that for a status of three independent lives (x), (y), (z), similarly to (3.2.5) and (3.2.6), 


tPx:y:z = tPx* tPy* zPz t4xyz = tax: tly’ z4z- 
Does the remark after equations (3.2.5) and (3.2.6) apply to this case? 
Show that (3.3.15) implies (3.2.5). 


Similar to Example 3.3.1-1, consider a random parameter C uniformly distributed on [1,2] (in 
appropriate units). Given C, the random lifetimes T; and T) are independent and exponential 
with the forces of mortality ¢ and 6 + 0.5, respectively. Find the probability that the first 
person will die before the second. 


Consider the situation of Example 3.3.2-2 under the condition that T(x), T (y) are exponen- 
tial. 


(a) Show that in order to find ,p70.79 for any integer k, it suffices to know /70,/71, bo. Di, 
and À. 


(b) Show that, in general, to estimate ;p,.y for any k, it suffices, to know p,,, Py,, and px,-y, 
for some particular x1, y1, X2, and yo. 


In a country, for a husband and wife of ages 25 and 20, the distributions of the lifetimes 
have the same mortality rate u(x) ~ 0.001 for x € [20,40]. Considering the common shock 
model and assuming that the common shock time Z has the exponential distribution with 
E{Z} = 2000, find {5425.29 and compare the answer with the value of the same characteristic 
in the case of the absence of common shock. Find 15935. 


Let as usual 14 be the indicator of an event A. Prove (3.2.7) proceeding from the relation 
o(min{7),7>}) + o(max{T1,72}) = (71) + (72), and setting the function ¢(T) = lir<:} 
for a fixed t. Which function (7) should we choose to prove directly (3.2.8)? 


Chapter 8 


Life Insurance Models 


In this chapter, we consider an insurance which provides for payment of a single benefit 
(a sum insured) at some random time in the future. As a rule, we will talk about life 
insurance, although the same model may be applied in other situations (for example, for 
describing contracts offering warranties for machines). 


1 A GENERAL MODEL 
1.1 The present value of a future payment 


In all models below, the initial time t = 0 is the time of policy issue, and the symbol ¥ 
stands for the time of benefit payment!. 

In the case of the life insurance for a person of age x, the r.v. Y may coincide with the 
moment of death T = T(x) (insurances payable at the moment of death), or it may differ 
from T. For example, if the benefit is paid at the end of the year of death, then the payment 
time is K + 1, where the curtate time K = |T], the integer part of T. In the case of the so 
called n-year term life insurance contract, where the insurance provides a payment only if 
the insured dies within n years, the moment of payment ¥ =T if T <n. If T > n, then no 
payment is provided, which we will indicate by setting Y = œ. We will see that such a way 
of writing leads to correct results. 

We will consider various types of life insurance in Section 2. 

The main feature of any life insurance contract consists in the time lag between the 
moment of payment and the time of policy issue. If Y assumes a value t, then the present 
value of the payment of a unit of money (from the standpoint of the initial time) is equal to 
a discount factor vz; see Section 0.8.3 for detail. 

In this and in the next chapters, for discrete time t = 0,1,2,..., we adopt the model 


ae (1.1.1) 


where the discount v = v is the discount for a unit time interval. (See also Section 0.8.) 


The letter is rarely used for time moments in notation, but we have a shortage of symbols since most of 
them are used traditionally for fixed purposes. Also, not any letter is good for denoting a time moment. For 
example, N would be good for an integer r.v., but traditionally it is not used for continuous variables. So, the 


reader is suggested to adopt ¥ as the notation for the moment of payment. 


437 


438 8. LIFE INSURANCE MODELS 


In the case of continuous time, we assume that interest is compounded continuously and 
write 
v=e° and v, =e fort >0, (1.1.2) 


where ò is the unit-time-interval interest rate, or the force of interest. (See again Section 
0.8.3.) In financial or insurance practice, time is usually measured in years, and in this case, 
6 is an annual rate. 

To make our exposition unified, we set v = e~ in the discrete time case too, viewing 5 
as a positive parameter. Under this assumption, in both discrete and continuous time cases, 
(1.1.1) and (1.1.2) are true, and we can use whichever is more convenient. 

The relations (1.1.1) and (1.1.2) presuppose, in particular, that the interest rate is certain 
(non-random) and does not change in time. This is serious simplification. In reality, interest 
rate or investment earning are neither certain nor constant. So, in general, 6 = 8(t) is a 
random process. We slightly touched on this question in Section 0.8.1, but modeling of 
insurance processes with a random or varying interest is beyond the scope of this book. 

If the moment of payment ¥ is random, then the present value of the future payment of 
a unit of money is also random and, by (1.1.2), is equal to the r.v. 


Z=e =y", (1.1.3) 


The expected value of Z is traditionally denoted by A, frequently with indices and other 
signs, depending on the type of contract under consideration. Thus, 


A=E{Z}=Efe™*}. 


The quantity A is called the actuarial present value (APV) or the net single premium of 
the contract. The term ‘single’ means that we are talking about a premium paid one time 
at the moment of policy issue. The term ‘ne?’ is related to the fact that such a premium 
does not reflect the riskiness of the contract. As we know, and as we will repeatedly see 
again, for the company to fulfill its obligation with a sufficiently large probability, the 
real premium should be larger than the net premium; in other words, the premium should 
include a security loading. We will consider this issue in Chapter 10. 

An important—and nice from a mathematical point of view—fact is that, as a function 
of 5, the APV A is the moment generating function (m.g.f.) of ¥. More precisely, 


A=Ef{e **} = My(-8), (1.1.4) 


where My(-) is the m.g.f. of Y. 
By (1.1.3), the /th moment 


E{Z'} = E{e""} = My(—I8). 


So, if we know the m.g.f. My(-), then we know all moments of Z. In particular, E{Z7} = 
My (—28), and 


Var{Z} = E{Z?} — (E{Z})* = Mw(—28) — (My(—8))’. (1.1.5) 


We will refer to this as the rule of double rate. 


1. A General Model 439 


The first example below concerns the whole life insurance of a life-age-x. This type of 
insurance provides for payment of a unit of money at the moment of death. In this case, 


Y = T (x), and the traditional notation for the APV is Ax. 


EXAMPLE 1. Consider the whole life insurance of a life-age-x in the case when mortal- 
ity follows the De Moivre law (7.1.6.1) with œ = 100. Then T(x) is uniformly distributed 
on [0, @ —x] (see Exercise 7-7b), and the m.g.f. Ma (z) = Mr (z) = (e©-* — 1) /[(@—x)z] 
in view of (0.4.3.4.) In accordance with (1.1.4), 
e7 (O-x)8 =T {= e7 (@-x)8 t= e° 


aco O=e a PE 


where s = (œ —x)6. By (1.1.5), 


2 
1 — e72(@-4)8 1— e~ (@-x)8 1—e-25 l—e- 2 
Var{Z} = = ; 1.1.7 
da 1 aR 5(0—x) Zs ( ; ) ay) 
Let x = 60 and œ = 100, and let us adopt 6 = 0.04 as the average annual force of interest for 
the remaining 40 years. In this case, s = 1.6, and A, = ne = 0.499. As may be easily 
computed using (1.1.7), Var{Z} ~ 0.050, and hence the standard deviation oz ~ 0.225. 


Thus, the present value of the obligation to pay $1 at the moment of death is, on the average, 
about 50¢ with a standard deviation of 22.5¢. 


Next, we consider a whole life insurance with benefits payable at the end of the year of 
death. In this case, ¥ = K(x) +1. The APV for this type of insurance is denoted by A, 
(without a bar). 


EXAMPLE 2. Consider the situation of the previous example but with benefits payable 
at the end of the year of death. Assume that x and @ are integers. Since T is uniformly 
distributed on [0,œ — x], the rv. ¥ = K(x) +1 assumes values 1,...,@—.x with the same 
probability q = 1/(@—.x). For example, if œ = 100 and x = 60, then ¥ equals 1,...,40 
with the same probability 1/40. 


The m.g.f. 
a @—x ; @—x ; j2 el) 
My(z) =E{e"} = X e*q=4 ) (e amia 
k=1 k=1 E 
as a geometric series. Setting v = e~’, we get that 
ws e75(@-%) v(1 = y=) 
e . 


Ax =My(—8) =q 


1—-e  (@-xy(l1-v) 
For the values of œ, x, and 6 in Example 1, A, ~ 0.4889. In Exercise 2, the reader is 


suggested to explain why A, <A in this case, and what we can expect in the general case. 
In Exercise 3, we compute Var{Z}. 


EXAMPLE 3. Let u(x) = u. Then X is exponential, and by the lack-of-memory property, 
T (x) is also exponential with the same parameter u. Then, by (0.4.3.5), Mr (x) (z)=u/(u-z), 


440 8. LIFE INSURANCE MODELS 


and by (1.1.4) and (1.1.5), 


2 
-H ow u 
E and Var{Z} = 9 (5) , (1.1.8) 


Both characteristics do not depend on x. 


If we know the distribution of ¥, then we can find not only moments but the distribution 
of Z itself. If Fy (t) is the d.f. of ¥, then from (1.1.3) it follows that the d.f. 


Fz(x) = P(Z < x) = P(e” < x) = P(W > —(Inx)/8) = 1—Fy(—(Inx)/8). (1.1.9) 


By (1.1.3), 0 < Z < 1. Therefore it makes sense to consider above only x € [0,1], and in 
this case, (— lnx) is non-negative. 


EXAMPLE 4. Assume that the survival function s(x) = (1 —x/100)!/?; see also Example 
7.1.1-1. Let us consider the distribution of Z for the whole life insurance with benefits 


payable upon death for a life-age-50. 


In this case, ¥ = T (50), and [see also (7.1.2.1)] pso = P(T (50) > t) = s(t +50) _ 


s(50) 
vI- ppe = „/1 =t /50 for t € [0,50]. 


Then Fy (t) = 1 — \/1 — t /50, and by (1.1.9), 
Fz(x) = y 1+ (Inx) /508. 


For instance, if 5 = 0.04, then the probability that the present value of $1 to be paid at the 
moment of death is less than 50¢ is \/1 + (In(1/2))/50-0.04 = 4/1 — (In2)/2 ~ 0.808. 


1.2 The present value of payments for a portfolio of many policies 


Consider a portfolio of n contracts with a benefit of one unit of money for each contract, 
and with the payment times Y4, ..., Yn, respectively. Let Z; = e~®”:, the present value of the 
payment corresponding to the ith contract. Then the present value for the whole portfolio 
is the rv. 

Z=Z +... +Zp. (1.2.1) 


This follows from the very concept of present value. Though the payments are provided 
at different moments of time, we evaluate these payments with respect to one moment, 
namely, the initial moment of time. So, all Z’s correspond to the same moment of time, and 
therefore we can sum them up. 

While all of this is true, it is still not very transparent. Consider it in more detail, observ- 
ing how the process of payments runs in time. Assume that payments are to be withdrawn 
from an investment fund growing at a rate 6. Let G be the initial amount of this fund. 

The variables ‘Y,...,%,, may be arbitrary and in general random, so the fact that Vy 
corresponds to the first contract does not mean that the payment for this contract comes 
first. Denote by Y(1),..., Fin) the same time moments ‘1, ..., Y, arranged in the ascending 


1. A General Model 441 


order. In particular, Ya) = min{‘;,...,%,}, the moment of the first payment, and ‘Y(,) = 
max{'P,,...,%,}, the moment of the last payment. In statistics, such r.v.’s are called order 
Statistics. 

Since the fund is growing at the rate 6, at the moment of the first payment, Y1), the fund 
will have amount Ge*:), For the company to fulfill its obligation, this amount should not 
be less than the unit of money that the company must pay at this moment. In other words, 
we should have Gee") > 1. Let us rewrite it as 


G>e Mu, (1.2.2) 


At the moment ‘¥(;), the company will pay 1, and after that the fund will become equal to 
Ger — 1, 
At the moment of the second payment, ‘¥(2), the fund will grow to the amount 


(Geto — 1J o 0) = GeO — eoo), 
The fund will be solvent at this moment if Gee” — eow) > 1, Dividing this inequal- 
ity by e0, we represent it as 


G > eo +e-*o, (1.2.3) 


We see that (1.2.2) follows from (1.2.3), so it suffices to consider only the latter condition. 
Similarly, for the time of the third payment, we come to the condition 


—ôY —ôõY —ôY 
G>e o pe 2) te (3), 
and for the kth payment, we get 
G > oe Fn 4 pew, 


Each time, the solvency condition includes the corresponding conditions for the previous 
payments. For the last (nth) payment we have 


Cae OS eto), (1.2.4) 


This condition ensures the possibility of the nth payment, and of all payments before. 
The next step is nice. Since ‘W(1),...,{¥(,) are the same variables ‘P4, ..., Pn, simply rear- 
ranged in ascending order, we can write that 


e Fa) +... peto =g or +... er SEn, 


Consequently, the condition (1.2.4) may be written as 
Goer bee Z a E 


Thus, for the fund to be solvent, it is necessary and sufficient that the initial fund amount 
G is equal to or greater than the sum Z; + ... + Zn. This precisely means that the present 
value of all future payments equals Z; + ... + Zn. If the Z’s are random, then the present 
value is also random. 

In conclusion, it makes sense to emphasize that the above argument is not necessary for 
proving (1.2.1); this relation does follow from the very definition of present value. We just 
gave more insight into the nature of this notion. 


442 8. LIFE INSURANCE MODELS 


Let us come back to (1.2.1). Assume that the contracts are signed at the same moment 
of time, and the r.v.’s Y1,..., P, are independent and have the same distribution. We can 
interpret this as if the contracts are of the same type and act independently. 

Let A = E{Z;}, the actuarial value of each contract, and o? = Var{Z;}. Then, in view of 
(1.2.1), E{Z} = nA and Var{Z} = no’. 

Denote by h the single premium per one contract, paid at the initial time. Then H = hn 
is the total amount of premiums paid. 

As we did before repeatedly, we can estimate the value of H that is sufficient for the 
company to fulfill its obligations with a given probability B. Calculations are very similar 
to what we did, for example, in Section 2.3.1.1. The probability under consideration is 
P(Z < H). In order for it to be equal to B, we should have 


pane cn) =r ZEA 2 yas =9(2 3 a) 


JVar{Z} ~~ ./Var{Z} oyn ` oyn 


Since Z is the sum of i.i.d. r.v.’s, for large n, we can use normal approximation, writing 
Z—nA _H-nA H — nA 
=P < x : 
p ( oyn ~ oyn ) ( oyn ) 


H —nA 
We obtain from this that ——4 x dps, Where gg, is the B-quantile of the standard normal 
n 


distribution. From the last relation, we get the estimate 
agso 
yn ` 


More precise estimates may be obtained similarly to what we did in 2.3.2. 


H ~x nA +qgsOvn, and h~ A+ (1.2.5) 


EXAMPLE 1 ([151, N4]?). A fund will pay death benefits of $10,000 on each of 900 
independent lives of age 30. You are given: 6 = 0.04, u = 0.01, and the death benefits are 
payable at the moment of death. 

The initial amount of the fund is established so that the probability that sufficient funds 
will be in hand to withdraw the benefit payment at the death of each individual is 0.95. 
Calculate the initial fund amount. 

We are considering the continuous time case, and the lifetimes are exponentially dis- 

1 
tributed. In accordance with (1.1.8), A = T =0.2, FIZ} = PT =7 and Var{Z} = 
5 — (4)? = 0.0711. Then, in accordance of (1.2.5), for a unit payment, 
H 7x 0.2 -900 + 1.64- v0.0711v900 ~ 193.12. 


So, for the fund to be solvent with 95% probability, it is enough to have at the beginning 
$10,000 x 193.12 = 1.9312 million. Note that the total amount to be paid is $10,000 x 
900 = 9 million. 


ZReprinted with permission of the Casualty Actuarial Society. 


2. Some Particular Types of Contracts 443 


2 SOME PARTICULAR TYPES OF CONTRACTS 


In this section, we consider some important particular types of life insurance. In most 
types, we distinguish two cases: when benefits are payable at the moment of death, and 
when they are provided at the end of the year of death. For brevity, we use slightly informal 
language and refer to the former case as that of continuous time, and to the latter case as 
the discrete time case. 

In practice, most contracts presuppose payments at the moment of death. (Certainly, 
there is a time gap between the death and the moment of payment, but companies add the 
earned interest corresponding to this period.) However, the available information comes 
from discrete time tables, which leads us to models involving only complete years. We 
establish relations between the actuarial characteristics corresponding to both cases in Sec- 
tions 7.1.5.2 and 2.1.3. 


2.1 Whole life insurance 


2.1.1 The continuous time case (benefits payable at the moment of 
death) 


We already considered this type. The payment follows the death of the insured whenever 
the death occurs. In this case, ¥ = T = T (x) for a life-age-x. 

Then the present value of the payment is Z = e~®"“), and the APV A, = E{e®?)} = 
Mr (x)(—8), where Mr(,)(z) is the m.g.f. of T(x). In terms of the characteristics ux(t) = 
u(x+t), and ,p, from Chapter 7, the density of T (x) is 


fræ (t) = hlt) Px = W(X +t) Dx (2.1.1) 


[see (7.1.2.4)]. Then, in general, 


Z= f eS u(t) pdt = | v u(t) prdt. (2.1.2) 


As we saw, E{Z?} = E{e~*®7“)\, which corresponds to the APV for the doubled interest 
rate 25. The traditional notation for this is 7A,. The superscript on the left means that the 
force of interest is doubled. 

A similar notation is applied to all other types of insurance discussed below. 

Making use of this notation, we can write that 


Var{Z} = "A, — Gy (2.1.3) 
Concrete examples were considered in Section 1.1. 


2.1.2 The discrete time case (benefits payable at the end of the year of 
death) 


The title above presupposes that we have chosen a year as a unit of time. As a matter of 
fact, the formulas below are true for any choice of the time unit. 


444 8. LIFE INSURANCE MODELS 


We already saw that in this case, ¥ = K(x) + 1 and takes on values 1,2, ... . In accordance 
with (7.1.3.1), P(K(x) =k) = wx-dx+. Setting, as usual, v = e~®, we obtain that the APV 


Ay = Efe *} = Efv} = VIP (K(x) =k) = Yo pe ges: (2.1.4) 
k=0 k=0 


Certainly, the infinite series in (2.1.4) is a mathematical abstraction. Terms for large k’s 
are small due to both factors, P(K(x) = k) and v**!, and the contribution of these terms 
may be much less significant than that of terms for moderate k’s. 


EXAMPLE 1. Let us estimate the net premium Aso for 6 = 0.04, proceeding from 
the data in Table 7.1.5.1-1 for the total population of the USA in 2002. The table esti- 
mates P(K(50) =k) by dso+x/Iso for k < 49, and gives an estimate for P(K(50) > 50) as 
d\00+/lso. Thus, our estimate is 


s S d50+k | 5 100+ 2 
9 exp{—ôlk+1)}— = +exp{—8: 51} (2.1.5) 
k=0 50 50 


Straight calculations lead to the estimate 0.326. 

We present all such estimates in the Illustrative Table in Appendix, Section 3. All figures 
for A, in this table are computed in the way mentioned. The data for /, were slightly 
smoothed in the region of ages 82-84, and rounded. The same concerns probabilities qx 
which are now computed as ly41/l with l.41 = lx — dy. 

The column in the Illustrative table for ?A, corresponds to the double rate, and all values 
in it are obtained in the same way as above. 

Since in the table under consideration, we use empirical data concerning a particular 
year, applying rather straightforward estimates, rounding some numbers, and not taking 
into account mortality for ages greater than 100, we view this table as illustrative and will 
use it for illustrative purposes. 

Next, we come back to Asọ and estimate the error of the above approximation that con- 
cerns a remainder of the series (2.1.4); namely, the quantity X} 3o = }Łg—50 e~ +1) P(K =k). 
For ages greater than 100, the table contains only one aggregate estimate. If we replace the 
factors e +) for k > 50 by their upper bound e751), then we will get that 


oo 


£ e UD pK = k) < £ e 850+) p( K = k) < e51 P(K > 50). 
k=50 k=50 


This upper bound leads to the last term in (2.1.5). The value of this term is ~ 0.0029. 
Now, let us estimate the minimum of the remainder }¢—sọ. Assume that the probabilities 


P(K = k) are not increasing for k > 50 (which is quite plausible) and are negligible for 
k > 60. Then 


ye PK She Ye RE =R): 
k=50 k=50 


Since e~S(K+1) is decreasing, the minimum of this sum corresponds to the case when 
all probabilities P(K = k) for 50 < k < 59, are the same. In this case, P(K =k) ~ 


2. Some Particular Types of Contracts 445 


ip P(K>50) = 75 aes © 0.00224, and 


59 59 1 — e7100 
Ye PK = k) © 0.00224 F e 4+!) = 0.00224- eae æ 0.0025. 
k=50 k=50 7 


We see that the difference between the last number and 0.0029 is not large. Let us also 
recall that the total estimate is about 0.326, so the remainder under discussion constitutes 
less than 1% of the total estimate. One may conjecture that this is much smaller than the 
accuracy of estimation in the life table Table 7.1.5.1-1 itself, and the error which is due to 
the fact that we are considering a constant 6. 


EXAMPLE 2. Assume that the density of the r.v. T = T (90), the future lifetime of a life- 


2 
age-90, may be well approximated by the linear decreasing function f7(t) = Tag! —t) 


for t € [0, 13]. In Exercise 7.35, we saw that such a model relatively well fit the data from 
Table 7.1.5.1-1. Set 6 = 0.04, and observe that 


k+1 1 2 1 
P(K=K)=P(kST<k+1)=| fdr =3 S++) = r- Fe. 


(In the third step we took into account that the density is a linear function.) Then 


12 12 
2 1 
Ago = vt P(K = k) = Ye OD 13—k ~ 0.830. 
j Le Cee ue 169 (3-8) — Teo 
(There are formulas for such sums [see, e.g., (9.4.4)], but it is better to use software.) 

Like Example 1, this example also serves for illustration. There are more sophisticated 
statistical methods for evaluating the distribution of X, and hence for estimating Ax. 


Another way of computing A, is based on the following backward recursion relation: 
Ax = vqx + pxVAx+1 = V(qx + prAx+1)- (2.1.6) 


Above, qx = 19x = P(T(x) < 1), and px = 1—q,. We will prove (2.1.6) a bit later. 

If we know the value of A, for some n and the probabilities py, we can make use of 
(2.1.6), moving backward and computing Ax for all integers x < n. 

Consider, for instance, the situation of Example 2. Suppose we accept the estimate of 
Ago that we obtained in this example. The survival function for ages less than 90 is not as 
simple as that for x > 90. But we can estimate A, for x’s less than 90 by using (2.1.6). In 
Exercise 8, the reader is invited to provide the corresponding Excel worksheet. 


To prove (2.1.6), we again apply the first step analysis. Let us start with a heuristic 
reasoning. 

The insured will either die within the first year or will survive one year. The former event 
occurs with probability qx, the payment in this case will be made at the end of the first year, 
and its discounted value is equal to v. 

The latter event occurs with probability p,. In this case, the insured at the end of the 
first year will be x + 1 years old, and the insurance process will start over, as from the very 
beginning. The present value of the future payment in this case is A,+; from the standpoint 


446 8. LIFE INSURANCE MODELS 


of the time ¢ = 1, the end of the first year. To evaluate the present value of such payments 
from the standpoint of the initial time, we should multiply Ax+1 by the discount v. 

Thus, the present value Z assumes the value v with probability qx, and, on the average, 
the value vA,+1 with probability px. This is reflected in (2.1.6). 

Rigorously, we may write it as follows: 


E{Z} = E{Z|T(x) < 1}P(T(x) <1) +E{Z|T(x) > 1}P(T (x) > 1) 
= E{Z|T(x) < l}qx +E{Z|T (x) > Lp pr. 


As was explained above, E{Z |T (x) < 1} =v, and E{Z|T (x) > 1} = vAx+1, which leads 
to (2.1.6). 

> Absolutely rigorously speaking, the very last relation also should be proved, and one 
can do it by writing 


E{Z|T(x) > 1} = E{vE@ "| T(x) > 1} = vE {O| T(x) > 1} 
= vE{ytKC+) | T(x) > 1} = vE{ KOH) = vA. 


(We used the fact that, once the insured has survived one year, her/his curtate lifetime is 
equal to one year plus how long she/he will live after attaining age x+ 1.) < 

From this point onward, when considering relations similar to (2.1.6), we will not always 
repeat the same standard argument each time but will rather restrict ourselves to heuristic 
proofs. 


2.1.3 A relation between A, and A, 


The relations we discuss here are based on the linear interpolation procedure from Sec- 
tion 7.1.5.2, where we assumed that the lifetime is uniformly distributed within each year 
of age. We will show that under this assumption, 

a od 


Ay = <Av (2.1.7) 


where 
i=e-1. (2.1.8) 


Since we usually possess data on survival probabilities only for complete years, we can 
compute, proceeding from this data, only A,. The formula (2.1.7) gives a way to estimate 
Ax. 

The quantity i is the profit that one gets at the end of a unit time interval after investing 
one unit of money at the beginning of this interval. If time is measured in years, this is 
an effective annual interest rate (shortly interest), or in other terminology an annual yield, 
provided that interest is compounded continuously. See Section 0.8.1 for more detail. 


It is worth noting that for small 6, the “correction” coefficient i in (2.1.7) is close to 
. 0.04 
i ewe 
one. Say, if 5 = 0.04, then = = ———— 
y S 0.04 
we recall that e" = 1 +x + % +o(x°). (See Appendix- (4.2.6). For the notation o(x), see 


Appendix, Section 4.1.) Making use of this expansion, we have 
i e-l 5 
ae ô 2 


~ 1.0202. This fact becomes understandable if 


2. Some Particular Types of Contracts 447 


so we can expect that i differs from 1 approximately by 6/2. In the numerical example 
above, this is indeed the case. 
»> Using Appendix—(4.2.2), one can show that 


peo. a (2.1.9) 


To prove (2.1.7), consider a life time T = T(x) and the corresponding curtate time K = 
K(x) = [T (x)|; we omit x if it does not cause misunderstanding. Let Tfrac = Thrac(x) be the 
fractional part of T (x), that is, Trac = T — K. So, T = K + Trac. The uniformity assumption 
is equivalent to the assumption that Trac is uniformly distributed on [0,1], regardless the 
value the r.v. K has taken. In other words, K and Ttrac are independent, and Trac is uniform 
on [0,1]. Then 


Ay = Efe T} = Efe 8K Mine) — Efe PK ela} — Efe PK} fe Olina} 


= PE{e SKE fe Since} — eA Ee Pine}, (2.1.10) 
The last factor Efe} = My (—8), where Mr, (z) is the m.g.f. of Trac. Hence, 
= 1— e~’ 
by (0.4.3.4), E{e ime} = (1 — e™?)/ë , and by virtue of (2.1.10), Ay = eb Ay = 
5 
-1 
i =A 


2.1.4 The case of benefits payable at the end of the m-thly period 


Next, we divide each year into m subintervals—for example, as months (m = 12) or 
quarters (m = 4)—and consider the insurance with a benefit payable at the end of the m- 
thly period in which death occurs. 

Denote by AL”) the corresponding APV. Formally, we can apply the above model to this 
case since the unit of time in this model was not specified. However, because available 
information usually concerns complete years, it is useful to specify a connection between 
Al”) and A,. 

As in the previous section, we will provide an approximation formula assuming that the 
lifetime is uniformly distributed within each year. Then, as we will prove below, 


TAs (2.1.11) 


where 
i”) = m| +i)" — 1]. (2.1.12) 


The characteristic i) is a nominal annual interest rate corresponding to the annual in- 
terest rate i. More precisely, i”) is an annual rate such that if interest is compounded at 
this rate m times in a year, then the total annual interest will be equal to i. We considered 
this notion in more detail in Section 0.8.2, and showed there that the characteristic with the 
mentioned property should satisfy (2.1.12). 

In the same section, we proved that lim,,,.. i”) == In(1 +i), so the result (2.1.7) from 
the previous section follows from (2.1.11) as a limiting case. 


448 8. LIFE INSURANCE MODELS 


Note also that since i”) is decreasing in m (see Section 0.8.2), the “correction coefficient” 
i/i™ is increasing from 1 to i/8, and hence for any m 


pa ap ae, (2.1.13) 


> In particular, from this and (2.1.9) it follows that 


82 
5 


0S ml Sate 
Proof of (2.1.11). Denote by K™ the number of complete periods of the length 1/7 that 
the insured survived. We will call these periods m-ths. Set R™) = K™ — mK. This is the 
number of complete m-ths lived in the year of death. Then K(”) = mK +R), 

Let, for example, m = 12. We will then call m-ths months, though as a matter of fact 
months have different lengths. Let, say, T = 25.34. Then the insured lived [25.34 - 12] = 
[304.08] = 304 complete months, so K (m) — 304. The insured lived 25 complete years, so 
mK = 12-25 = 300, and R™) = 4—that is, the insured lived 4 complete months in the last 
year. 

Let us come back to the general case. Under the assumption made, the r.v. K and R0™ 
are independent, and R0”) takes on values 0,1,...,m — 1 with the same probability 1/m. 

Now we should recall that 6 is an annual rate, and we measure time in years. So, the 
payment time is equal not to K’”) + 1 (as it would have been, if we had chosen an m-th as 
a unit of time), but (K (™) +1). Then 


AL” = Efexp{—8(K) + 1)/m}} = Efexp{—8(mK +R +1)/m}} 
= E{exp{—8K — §(R + 1)/m}} = Efexp{—SK} }E {exp{—8(R™ + 1)/m}} 


= Efe JE {epf -È (R + 1)}} = PE fe MM PE fep- (RM +19} 


=A: PE (exp{—2(R™ +1)}}. (2.1.14) 


The last expectation equals Mpm), (—8/m), where Maim,;(z) is the m.g.f. of R +1. 
Since R™”) + 1 assumes values 1,...,m with equal probabilities, 


m l 1 1g” 
k 
Mri (2) = oe Wie a fae 


Taking into account that e` =v and v = 1 [see (2.1.8)], we have 


l im 1—Y 1 1—v 1 1-1/(1+3) 

y — — 

m  1—vi/m my!/m—] m(1+i)!/m—-1 
1 i 1). i 


T+i m[ +a —1] 1+i ie) 


Main) .4(—8/m) = 


Substituting it into (2.1.14) and noticing that e = 1 +i, we arrive at (2.1.11). E «< 


2. Some Particular Types of Contracts 449 


2.2 Deferred whole life insurance 
2.2.1 The continuous time case 


An m-year deferred whole life insurance provides for a benefit only if the insured sur- 
vives m years. In accordance with the convention made in Section 1.1, in this case, we set 
the payment time ¥ = œ (the payment will never happen) if T(x) < m, and ¥ = T(x) if 
T (x) >m. Whether we include the event {T (x) = m} into the former case or into the latter 
does not matter because P(T (x) = m) = 0. Because Z = e~®”, 


fO if T(x) <m, 
aa sare =v") if T(x) >m. Cen 


(If 6 = 0, then by convention, we set O -œ = œ, and Z eY — oO =e 0.) 
Certainly, we could write the representation (2.2.1) without involving ¥ into consideration: 
the present value equals zero if no payments are provided, and equals e~*!) if payments 
are provided at time T(x) >m. 

In this case, the APV is denoted by „Ax. If fro (t) is the density of T (x), then by (2.2.1), 


miAx = E{Z} Sf e™ fro) (t)dt. (2.2.2) 
m 
The following formula may simplify calculations: 
m| Ax = mPxV”Ax+m. (2.2.3) 


Derivation of (2.2.3) is based on an argument similar to what we used in establishing 
(2.1.6). If the insured attains age x+ m, the insurance becomes a “usual” whole life in- 
surance whose actuarial present value is Ayn. However, it happens with the probability 
P(T(x) >m) = mpx. Also, to make evaluation proper from the standpoint of the initial 
time, we should multiply A,+,, by the discount factor v”. 

> Formally, it may be written as follows: 


E{Z}=0-P(T (x) <m) +E{Z|T (x) > m}P(T (x) > m) = Efe! | T(x) >m} mpx 
= mpr le Oem) } = mpre Ee Farm) } = mDxV" Axtm- < 


EXAMPLE 1 ([151, N2]*). For a 5-year deferred whole life insurance of 1, payable at 
the moment of death of (x), you are given: Z is the present value r.v. of this insurance; 
ô= 0.1; u = 0.04. Calculate Var{Z}. 

In our case, m = 5, and the distribution of T is exponential. In accordance with (2.2.3) 


and (1.1.8), E{Z}=e H" e A pyme "HT 8 0.1419; E{Z?}=e "e". A = 
u 


eee) aos ~ 0.0502; and Var{Z} = E{Z?}—(E{Z})? ~ 0.0502—(0.1419)? ~ 0.0301. 


Reprinted with permission of the Casualty Actuarial Society. 


450 8. LIFE INSURANCE MODELS 


2.2.2 The discrete time case 


In this case, an m-year deferred whole life insurance is the same type of insurance as 
above with the exception that a payment, if any, is provided at the end of the year of 
death. Under such an assumption, ¥ = œ if T(x) < m, and ¥ = K(x) +1 for T(x) >m. 
Accordingly, Z = 0 if T(x) < m, and Z = eK) — yK@)+! for T(x) >m. 

The symbol for the APV is „Ax (the bar is removed), and the counterparts of the formulas 
(2.2.2) and (2.2.3) are 


oo 


miAx = }, vA P(K (x) =k), (2.2.4) 
k=m 
and 
m\Ax = mPxV" Axim, (2.2.5) 


respectively. The reader is invited to verify (2.2.4) and (2.2.5) on her/his own applying the 
same heuristic argument as above. 

To establish a connection between m;Ax and miAx, we can combine (2.2.3) and (2.2.5) 
with (2.1.7), keeping in mind that this is an approximation formula. (When is it precise? 
See also Exercise 13.) 

Variances, as is easily verified, may be computed by the same rule of double rate: E{Z*} 
is equal to the APV with replacement of 5 by 25. The corresponding symbols for this 
operation are mA and mfx 


2.3 Term insurance 
2.3.1 Continuous time 


An n-year term insurance provides for a payment only if death occurs within n years. In 

this case, ¥ = T (x) if T(x) < n, and ¥ = œ if T (x) > n. Accordingly, 
—ôT (x) — „T (x) ; < 

Z=: ° AE E (2.3.1) 

0 if T(x) >n. 


Setting m = n in (2.2.1), and comparing it with (2.3.1), we see that n-term insurance is, 
in a sense, the opposite of n-deferred insurance. We will use this circumstance below. 


The traditional notation for the APV in this case is As where the bar means that we deal 
with “continuous time”, and the superscript 1 marks this particular type of insurance. In 
Section 2.4.2, we consider a somewhat different type, where in the corresponding symbol 
there will be no superscript 1. 


From (2.3.1) it follows that 


ce sj e™ fra (t)dt, (2.3.2) 
0 


which may be used for direct calculations. 
It is worthwhile to emphasize that, as follows from (2.3.2), the value of As is determined 
by the distribution of T (x) in the interval [0,n], and does not depend on the mortality rate 


after n years. The same is true for other moments, for example, the variance. 


2. Some Particular Types of Contracts 451 


EXAMPLE 1. Let us provide a quick rough estimate of Asaa proceeding from the 
data in Table 7.1.5.1-1 and 6 = 0.03. As was told repeatedly, it is not realistic to assume 
that the distribution of T (x) is exponential. However, if we look at the table mentioned, we 
will see that the variation of the estimates for u(x) in the interval [20,30] is rather due to 
random fluctuations, and is close to 0.0095. We know that the density f7,,)(t) for t € [0,n| 
is completely determined by the values of u(x) on [x,x +n]; see (7.1.2.4)-(7.1.2.5). So, 
when using (2.3.2), we can assume f7 29) (t) to be exponential with u = 0.0095. 

It makes sense to present right away a general formula for the exponential distribution. 
We have 


Age f ; eue dt =u f "ttg = 4 (1 z gwea] 
y 0 utd 


[compare with (1.1.8)]. In our case, AoT ~ 0.0785. 


It is important that if we know the values of A, and /, for integer x’s, then the computa- 
tion of Ay. is straightforward for any n. 

Indeed, first recall that if we know /,’s, then we know probabilities „Px = lx+n/lx; see 
Section 7.1.4. 

Secondly, we can use the formula 


a —] Ein 
Ax =A mtv" nPxAxtn; (2.3.3) 


which implies 
ži e. = 
Aya =A,— y” nPxAxtn- 
The proof of (2.3.3) almost immediately follows from (2.2.1) and (2.3.1). Denote by Z1 
the r.v. in (2.2.1) with m = n, and let Z be defined as in (2.3.1). For clarity, let us write them 
down together: 


-òT (x) ; j < 
Z= a if T(x) <n, B= . if T(x) <n, (2.3.4) 


0 if T(x) >n. ’ eT) if T(x) >n. 


8T (x) 


Now it is easy to see that Z + Z1 = e7 , which corresponds to the whole life insurance. 


Taking the expected values of both sides, using the symbols Aj, Ax , and Ax, respectively, 
and applying (2.2.5), we come to (2.3.3). In Exercise 17, we consider a simple heuristic 
proof of (2.3.3). 
A corresponding example will be considered for the discrete time case. 
> The next observation concerns the correlation between Z and Z, in (2.3.4). Since 
Z-Z, =O, the covariance 
Cov{Z, Zi} = —E{Z}E{Z}}; 


see (0.2.4.7). 
EXAMPLE 2. Assume that for some x,n and 6, we have ree = 0.01, ve = 0.0005, 


zl 271 ss : 
nA, = 0.1, and „A, = 0.0136. It worth emphasizing that these numbers are marginal 
characteristics of Z and Zı. Nevertheless, we can find the correlation between the two 
types of insurance. 


452 8. LIFE INSURANCE MODELS 


re 71 71 


Indeed, Var{Z} = “Ayq— (Aim) = 0.0004; Var{Zi} = pA, — (wA,) = 0.0036. Thus, 


Oz = 0.02, and oz, = 0.06. Next, Cov{Z, Z;} = As SA = —0.001. Eventually 


-E{Z}E{Zı} _ —0.001 5 
ozoz 0.02-0.06  6' 


Corr{Z,Z1} = < 


2.3.2 Discrete time 


For an n-year term life insurance, in the case of “discrete time”, 


We write K(x) +1 because if K(x) =k, then the payment is provided at the end of the year 
of death, that is, at the moment k + 1. 

The traditional notation for the APV is Ala The counterparts of (2.3.2) and (2.3.3) look 
as follows: 


A= LY P(K (x) = k) = $ v! peak (2.3.5) 
k=0 k=0 


[compare with (2.1.4)], and 


Ax = Alm tv" nPrAxtin, Or Alm = Ax — V" nPxAxtn- (2.3.6) 


LA 


EXAMPLE 1. Using the Illustrative Life Table at the end of the book, compute Aj 351 
for the interest rate 6 = 0.04. First of all, 3530 = (165/130) = (82609/97743) ~ 0.8452. 
Next, from the Illustrative table, we have A39 œ% 0.1666, and A65 ~ 0.5055. Then Ajo 35 = 
A30 — v% 35p30A6s © 0.1666 — e~ -°4350,.8452 - 0.5055 ~ 0.06124. 


The next formula suggests a straight recursive way of computing Ala Namely, 


Arm = Nae VPA yaaa} (2.3.7) 


The logic of the derivation is the same. With probability g, the insured will die within the 
first year. In this case a unit of money will be paid at the end of the year, and the present 
value of this payment is v. With the complement probability p,, the insured will attain age 
x+ 1, and the contract will become equivalent to the (n — 1)-term contract for a life-age- 
(x+1). The formal proof uses the formula for total expectation, and is very similar to what 
we did above. 

To use (2.3.7), first we replace n in this formula by the letter k, and x by x+n — k. This 
leads to 


1 aa 1 
Ankh =v lacin—t F Prtn-KA pyn- (k1) FT . (2.3.8) 


Denote Aag em PY g(k). Then (2.3.8) may be rewritten as 


g(k) = [9x-+n—k + Drink glk = 1)] : (2.3.9) 


2. Some Particular Types of Contracts 453 


q 65-k g(k) g(k) for v ° 

0 0 

0.019114) 0.01835) 0.017615914 
0.01748, 0.034089) 0.032061011 


0.016199) 0.047746, 0.043997675 
0.014908) 0.059465, 0.053682589 
0.013701| 0.069456 0.0614225 


0.056598 


FIGURE 1. 


This is a recursion formula. Note that g(0) = Ald = 0, since Ala = 0 for any x. (The 
time period has zero length, and nothing will be paid.) Then from (2.3.9) it follows that 
g(1) = var4n—1. Setting k = 2,...,n consecutively in (2.3.9), we can get g(n) = Aj._ inn 
steps. 


EXAMPLE 2. Let x = 60, n = 5, and for the corresponding part of the population, lso = 
91748, 16, = 90491, l62 = 89142, l63 = 87698, log = 86165, and J65 = 84518. Estimate 
Abn a for v = 0.96. We can readily provide the recursive procedure. The corresponding 
Excel worksheet is given in Fig.1. Years are considered there in descending order, so l6o is 
in Cell B7, and l65 in B2. The probabilities qy are estimated by (ly — lx+1 )/lx and are given 
in Column C. The command for C3 is =(B3-B2)/B3. The quantities g(k) are computed 
recursively in Column D by (2.3.9), where py = 1 — qx. The corresponding command for 


D3 is =$G$2*(C3+(1-C3)*D2). The result is g(5) given in Cell D7. Thus, Asji = 0.0694. 


Now it will not take long to compute the variance. We should repeat the same procedure 
for the double rate 28, which corresponds to the squared discount v?. To this end, we should 
just copy the Column D, keeping the same commands and replacing v by v’. It is done in 
Column E with the command for E3 equal to =$}G$5*(C3+(1-C3)*E2). Thus, in Cell E7 
we have *Abom ~ 0.0614. Then, we compute the variance as Aas = (Atos) = 0.0566 in 
Cell D10=E7-D7°2. 


Note that the arrival of powerful computers reduces the demand for such procedures as 
above. In our particular case, what we really need to know are the probabilities px. If 
we know them, then we may immediately compute the probabilities P(K = k) = pp: dx+k 
, and then compute APVs by direct (straightforward) formulas like (2.3.9). Nevertheless, 
procedures as above may make programs more flexible and less time consuming. 


2.4 Endowments 


2.4.1 Pure endowment 


In an n-year pure endowment insurance, benefits are payable at the end of the nth year, 
provided that the insured survives this term. In this case ¥ =n if T(x) > n, and ¥ = œ if 


454 8. LIFE INSURANCE MODELS 


T(x) <n. Accordingly, 


fO if T(x) <n, 
a= = =v" if T(x)>n. eel) 


The traditional notation for the APV is A , or „Ex. The latter notation is used more fre- 


quently in the annuity context (see Chapter 9) but, for the understandable reason, we will 
prefer to use it in this chapter also. 
Clearly, 
nEx = E{Z} = v"P(T (x) > n) = v” ye 


To compute the variance, recall that for a r.v. X = c or 0 with probabilities p and q, respec- 
tively, Var{X} = c° pq. Hence, since P(T (x) >n) = nPx, 


Var{Z} = y” nPx(1 —npx). (2.4.2) 


In Exercise 25, we prove the same making use of the rule of double rate (1.1.5). 


EXAMPLE 1. As in Example 2.3.2-2, let lso = 91748, and I65 = 84518. Then 5p60 = 
(les /l60) ~ 0.9211, and for v = 0.96, we have 5E65 œ~ (0.96)° -0.9211 ~ 0.7510. The fact 
that the number we got is much higher than the answer in Example 2.3.2-2 is not surprising. 
For the insured, there is much more chance of surviving five years than of dying before the 
end of the term. So, the pure endowment should cost more than the 5-year-term insurance. 
By (2.4.2), the standard deviation 67 = v"\/npx(1 — „px) ~ (0.96)5 - 0.9211 -0.0789 ~ 
0.2198. 


2.4.2 Endowment 


In an n-year endowment insurance, the benefit is paid upon death if the insured dies 
within the interval [0,1], and it is paid at the end of the period mentioned if the insured 
survives n years. 

More precisely, in the continuous time case, ¥ = min{T (x),n} and 


(2.4.3) 


The APV is denoted by Ax: (there is no superscript 1). Denote by Z; the r.v. in (2.3.1), and 
by Zp the r.v. in (2.4.1). Let us write these r.v.’s down as follows: 


e T(x) if T (x) <n, 0 if T(x) <n, 
Z| = n if T(x) >n.’ Z = Pa if T(x) >n. (2.4.4) 


We see that Z = Z, + Z2, and hence 


Aym = E{Z} = E{Z1} + E{Z2} =A + nEs =AcmtV" apx: (2.4.5) 


In Exercise 32, the reader is encouraged to provide heuristic and rigorous proofs of this 
relation. 


2. Some Particular Types of Contracts 455 


For the discrete time case, we define Y as min{K (x) + 1,n}. Then 


To clarify why we wrote K + 1 above, consider the situation when the payment is made at 
the moment n. It may happen in the following two cases. 


e The insured dies within the interval [n — 1,n). Then the payment will be provided 
at the end of the nth year, that is, at the moment t = n. In this case, K =n — 1, and 
min{K(x) + 1,7} is indeed equal to n. 


e The insured attains the age of x+n. Then the payment is again provided at the 
moment n. On the other hand, in this case K(x) > n, and min{K(x) + 1,n} =n. 


The APV is denoted by Ax: 
The counterpart of (2.4.4) for the discrete case is 


—8(K(x)+1) if K(x) < ö EKOS 
e if K(x) <n, B x) <n, 

We again have Z = Z; + Zo, and 
Axm = Armt nex =Ava + v” pPx- (2.4.7) 


Since the last relation is true for any rate ð, it is true for the doubled rate 26 also, so we 
can write 
2 241 2 241 2 
Axm = Arm t nEx = Art Y ” Dx: (2.4.8) 


This gives a good way of computing variances. 


EXAMPLE 1. We combine the results from Example 2.3.2-2 and Example 2.4.1-1, 
which deal with the same data. From the former example, we have Atos z 0.069, and 
from the latter, 5E65 ~ 0.751. Hence, 


A60:31 = Agog + 5E60 ~ 0.069 +0.751 = 0.820. 


Let us consider variances. In Example 2.3.2-2, we got "Atos ~ 0.0614. Next we com- 
pute 2E60 = E{Z3}, where Zp is the present value of the corresponding pure endowment. 
In Example 2.4.1-1, we computed sp60 © 0.9211. Then E{Z3} = v” „px ~ (0.96)? 
0.9211 ~ 0.6124. Now, by (2.4.8), ZA 60:3] = 0.0614 + 0.6124 = 0.6738, and Var{Z} = 
?A 60:3) — (460:5) ~ 0-6738 — (0.820)? = 0.0014. 

> It is interesting to observe that, while Var{Z; } ~ 0.057 and Var{ Z2} ~ 0.048 (see Ex- 
amples 2.3.2-2 and 2.4.1-1), the variance of the sum Z = Zı + Z2 is about only 0.0015! This 
is connected with the fact that the r.v.’s Z4 and Zz are negatively correlated. We considered 
a similar fact in Section 2.3.1, and will come back to this in Exercise 18. O < 


Route 1 => page 467 


456 8. LIFE INSURANCE MODELS 


3 VARYING BENEFITS 
3.1 Certain payments 


Fixed benefits that do not depend on the payment time and other circumstances are called 
level benefits. In this case, we can choose this fixed amount of benefits to be the unit of 
money, which we did in all models above. In general, the size of the benefits to be paid can 
vary in time and/or may depend on causes of death. The latter case is considered in Section 
4; this section concerns the dependence on time. 

Let c, be the benefit to be paid if the payment time occurs to be equal to t. The present 
value of such a payment is cre ™®™. Since the time of payment is the r.v. ¥, the benefit is the 
r.v. cw. Then the present value of the benefit to be paid is the r.v. 


Z=cpe™, (3.1.1) 
The APV 
A=Ef{cwye *"}. (3.1.2) 


The rule of double rate does not work here, since in our case, 
E{2?} = E{cye*}. 
If we can compute the last quantity, we can find the variance writing 
Var{Z} = E{che ®t] — (Efcewe™*})?. 


Consider whole life insurance. For APVs, we will keep the same notations A, and Ay. In 
accordance with (3.1.2), 


Ax = m ce fro) (t)dt. (3.1.3) 
0 


In the discrete time case, we have 


Ax = $ cipe ©) P(K(x) =k). 
k=0 


EXAMPLE 1. A special life insurance on a life-age-40 provides for a payment of 3 times 
the annual salary paid to the family if and only if the insured dies before the retirement age 
of 65. Assume that all possible moments of death are equally likely within the interval 
[40,65], and the probability of attaining the age of 65 is 0.85. The initial salary is $60,000, 
and it is growing at an annual rate of 4%. The investment rate 6 = 6%. Find the net single 
premium, that is, the APV. 


(a) We take, as a unit of money, the tripled initial salary $180,000. First, consider the 
model when the salary is growing continuously. This means that c; = exp{0.04r} if t < 25, 
and c; = 0 otherwise. 


3. Varying Benefits 457 


Under the assumption we made, the density f(t) = fr(4o) (t) is constant on [0,25]. On 
the other hand, P(T (40) < 25) = i 5 f(t)dt. Since this probability equals 0.15, the density 
f) =0.15. + = 0.006 on [0,25]. What f(t) equals on [25,%) does not matter, since c, =0 
for t > 25 and integration in (3.1.3) should be carried out over [0,25]. Thus, by (3.1.3), 


25 
A, = ih ce ™ fray (t)dt =f" ce ™ fry (t)dt = = oO e008 <0) 006at 


0 


1 — e70.02-25 


= 0.006 =y ~ 0.118 units. 


Note that the expected value of the real payment is [5° e°"- f(t)dt = [E eè -0.006dt ~ 
0.258 units, so the net single premium is less than half the expected benefit. In dollars, the 
premium is 0.118 - $180,000 = $21,240. 


(b) In reality, the increase of salaries is carried out in a discrete way. Assume that the 
salary is increased by 4% at the beginning of each year. Then c, = 1.04", where [f] is the 
integer part of t. The benefits are still paid at the time of death. In this case, 


A 7 a m 24 k+1 
Ax =f 1.04" e-& -0.006dt = Ł f 1.04e7% .0.006dt = 0.006 È 1.04" f edt. 
i k=0" 4 k=0 k 


The last integral is equal to ge Ok), where i = e — 1; see also Section 2.1.3. Thus, 


0.04 _ 1 24 eo: 04 =F | 
pa ee 006 e ¢ 0.06 Èi (1.04e=°-06)k 
: k=0 


0.04 
Au 04 1 “ge = (1.04e70-06)25 
0.04 <“ 1 — 1.04¢-0.06 


A, = 0.006 Ê 


= 0.006 zx 0.1135. 


Certainly, we had to expect that the answer would be less than that in the case of continuous 
salary growth (why?). Eventually, the net single premium is 0.1135 -$180,000 ~ $20,445. 


Now consider a special case when the benefit is either growing linearly in time, or in- 
creases each year by a fixed amount at the beginning of the year. More precisely, we 
consider either 


C = ct, 


where c is the rate of growth, or 
ca =c- [t+], 


where c is the absolute increase per year. Without loss of generality, one can set c = 1. 

The former insurance is called continuously increasing (whole life) insurance; the APV 
in this case is denoted by (7A), (J comes from “increasing”, the bar indicates that “time is 
continuous”). In the latter case, we deal with an annually increasing insurance. (It would 
be more precise to simply call it a discretely increasing annuity, since formally we do not 
specify the unit of time here.) The notation for the APV in this case is (IA),. 


458 8. LIFE INSURANCE MODELS 
EXAMPLE 2. Find (JA)s50 for 6 = 0.04, if X follows De Moivre’s law with œ = 100. 
ez 50 1 
Thus, T (50) is uniform on [0,50], and (ZA)so = f fore 504: Integration by parts or 
0 


use of software lead to ([A)50 7.52. 


EXAMPLE 3. For the same 6 and the same distribution of X as in Example 2, find (JA) 50 
Similarly to what we did in Example 1b, 
1; 4% 


(IA) Ji ae 1] —ot 1 dt y [ve 1) —ôt 1 dt £ (k+ 1) —8(k+1) 
x= e —— — f = e . 
0 500 Ah © 50° 505 & 


There are formulas for such sums [see, e.g., (9.4.4)], but it is easier to compute the last 
quantity using software. For 6 = 0.04, the answer is ~ 7.64. 

The fact that the answer is larger than that in Example 2 does not contradict common 
sense. Although the benefit in Example 2 is growing continuously, it starts from zero, 
while in Example 3, the initial value is one. 


In the discrete time case, the annually increasing whole life insurance corresponds to 
Y = K + 1 and cy, = k+ 1. Then the present value Z = (K + 1)v**!. The standard notation 
for the APV is (JA),. 


EXAMPLE 4. Find (JA), if u(x) = u. We know (see Exercise 7-28) that K has a geo- 
metric distribution (0.3.1.9) with parameter p = 1 — e”. Then, setting v = e`’, we have 


(IA); = E{(K+1 H} =F (eb Te BAD Leet = (1e) Y(t Le ČH, 
k=0 k=0 


Making use of the general formula 


oo 


YY (k+ Dg =1/(1-@), (3.1.4) 
k=0 
we get 
(IA), = ed — e) Ja —e (bt)? 


(The formula (3.1.4) may be found in practically any algebra or calculus textbook, and in 
any probability textbook, since with its help we derive the formula (0.3.1.12) for the mean 
of the geometric distribution (e.g., [116, p.174], [122, p.67]). See also (9.4.4).) 

One more simple example is considered in Exercise 44. 


Another special case is an n-year term decreasing insurance where the payments are 
decreasing from n to zero either linearly (continuously decreasing) in accordance with the 
formula c; = n — t, or in the discrete way when c; = n — |t] (annually decreasing). The 
symbols for the APV in these cases are (DA) xm and (DA),-q, respectively. In Exercise 40b, 
we compute these quantities when X is uniform. 

The notation for the discrete case is, naturally, (DA )x:m. 


It is important to keep in mind that for certain varying benefits, we can compute the 
APVs combining term insurances with level benefits. 


3. Varying Benefits 459 


EXAMPLE 5. Consider a discrete-time whole life insurance on a life-age-50, providing 
for payment of $100K during the first 10 years; $110K in the 11th year after policy issue; 
and in each year after, an increase of $10K until the insured reaches the age of 70. If the 
insured attains this age, the benefit will be level (will not change) at $200 K. Write the APV 
in terms of the characteristic A, and survival probabilities. 

We take $10,000 as a unit of money. The company will pay 10 units for sure, and this is 
equivalent to the whole life insurance with a benefit of 10. If the insured attains the age of 
60, an additional one unit will be added to the benefit. This is equivalent to an additional 
10-year deferred whole life insurance with unit benefit. If the insured attains the age of 61, 
again an additional unit will be added to the benefit, and this is equivalent to an additional 
11-year deferred whole life insurance with unit benefit. We can continue to reason in this 
fashion until age 69 when the last additional unit will be added to the benefit, and the whole 
benefit will be equal to 20 units. Thus, 


APV = 10Aso +v"? 1opsoAco +v"! 11psoAe1 + --. +v"? 1950 A69- 


EXAMPLE 6 ([153, N1]*) demonstrates straight calculations. XYZ Bank has issued a 
5-year interest free loan, collecting annual payments of $10,000 at the end of each year. 
To protect itself from loan defaults, XYZ has purchased default insurance that pays the 
balance of the loan at the time of default. The probabilities of default at each payment due 
date are given in the table below. 


Payment number 1 2 3 4 5 
Probability of default (given no prior default) 0.04 0.08 0.10 0.10 0.06 


The annual interest for the default insurance is 6%. Calculate the expected present value 
of the insurance benefit. 
It is convenient to present calculations in the following table. 


k| ak p | P(K =k) | v'(6—k)P(K=k) 
1 | 0.04 | 0.9600 | 0.0400 0.1887 
2 | 0.08 | 0.8832 | 0.0768 0.2734 
3 | 0.10 | 0.7949 | 0.0883 0.2225 
4 | 0.10 | 0.7154 | 0.0795 0.1259 
5 | 0.06 | 0.6725 | 0.0429 0.0320 


Here, k stands for the number of a payment, and gx is the given probability of default. 
In the third column, we compute the probability that no default happens before or at the 
moment of the kth payment. Thus, p = -ıp : (1 — qz). For example, the number in the 
second row equals (1 — 0.04) (1 — 0.08). The fourth column contains P(K = k), where K is 
the moment of default, if any. Clearly, P(K = k) = -1P ` qk- 


Reprinted with permission of the Casualty Actuarial Society. 


460 8. LIFE INSURANCE MODELS 


Let us choose $10,000 as a unit of money. If default happens at the moment of the kth 
payment, then the insurance pays (6 —k) units, which should be multiplied by the discount 
vk, where v = Trou ~ 0.9434. The fifth column gives the values of vt (6 —k)P(K =k). 

The APV is the sum of all values in the fifth column, which is equal to 0.8426. So, the 
APV is $10,000 - 0.8426 = $8,426. 


3.2 Random payments 


In general, the size of a benefit may depend not only on the time of “failure” (say, death 
or product-failure), but on other circumstances as well. In this case, the size of the payment 
may be random, even if the time of failure is known. Formally, this means that in the 
general model, we should consider r.v.’s rather than certain payment functions. We restrict 
ourselves to one example. 


EXAMPLE 1. Let us consider an insurance against unemployment and call the loss 
of job “failure”. The insurance covers a period of 10 years. Assume the probability that 
failure occurs during this period to be 0.1; that is, 0.9 is the probability that the insured will 
either survive 10 years not losing the job or will die within the period [0,10] having the 
job. Assume also that all moments in the interval [0,10] are equally likely to be a failure 
moment. 

Denote by & the time, starting from the failure moment, at which the insured will either 
find a new job or die. We assume that the company reimburses the loss of the salary by 
a single payment at the time mentioned. Such an assumption is somewhat artificial, but 
makes the example simpler. In Chapter 9 where we study annuities, in Example 9.3.1- 
4, we consider the same problem assuming that the company pays to the insured her/his 
current salary from the moment of failure until the moment when the insured gets a job or 
die. In Exercise 42, we consider the problem under the additional assumption that if the 
insured does not find a job during a year, then the company pays the annual salary, and the 
contract terminates. This also makes the problem more realistic. 

Now, suppose € is exponentially distributed with parameter b = 2 and independent of the 
time of failure. The salary is growing exponentially at a rate of 4%; the initial annual salary 
rate equals 1. The interest rate 6 = 6%. If the insured loses the job within the interval 
[0,10], the company fulfills its obligations in full even if the insured finds a job (or die) 
after the ten year period. For simplicity, we will provide all calculations in the scheme of 
continuous time. 

We deal with a 10-year-term insurance with the exception that, at the moment of failure, 
the real size of the future payment is unknown. If the failure occurs at moment f, the lost 
annual salary is e°-°*", The company will reimburse this loss at the random moment t + &, 
and the amount of reimbursement will be equal to e-°"E. (We measure time in years, and 
deal with annual salaries. The company pays the salary that was lost, and does not take into 
account that the salary would have been growing if the insured had not lost the job.) 

The payment will be made at the moment t +&. Consequently, the present value of such 
a payment is the r.v. 


R, = e75(t+8) eE — o—0-06-(1+6) e0 — 90.021 eo 0.065 


4. Multiple Decrement and Multiple Life Models 461 


Denote by X ar.v. uniformly distributed on [0,10]. We identify X with the moment of 
failure if it occurs. Then, for the present value of the total benefit, we can write the repre- 


sentation 
Ze 0 with probability 0.9, 
~ | e70.02X 9-0.065& with probability 0.1. 


The actuarial present value 
A= E{Z} = 0.1 . Bie 006 or 00 ey = 0.1 E Efe) ete OSE 


because X and € are independent. Since X is uniform on (0, 10], the expectation E{e~°-* } = 


My(—0.02) = [(1 — e7 100-02) /(0.02 - 10)] ~ 0.906. For the second expectation, we have 


Efe ME} = f et. fe(t)dt = f e0 06t. 2e" dt ~ 0.471. 
0 0 


Thus, A ~ 0.1 - 0.906 - 0.471 ~ 0.0427. 


4 MULTIPLE DECREMENT AND MULTIPLE LIFE MODELS 
4.1 Multiple decrements 


We use the general framework and the notation from Section 7.2.1. When it does not 
cause misunderstanding, we omit the symbol x denoting the age of the insured. 

Assume that the benefit to be paid depends on the moment of payment and the cause of 
failure. Denote by cj; the amount of the benefit to be paid if the failure occurs at moment 
t from cause j. We denote the time of payment by ‘VY, and the number of the cause by J. 
Both quantities are random. The present value of the insurance is the r.v. 


Z= cre ð, (4.1.1) 


and the APV 
A = E{crpe®®}, (4.1.2) 


[Compare with (3.1.1) and (3.1.2). In this subsection, we will use the symbol A for all types 
of insurance. ] 

For example, for the whole life insurance Y = T = T(x), and we can compute the ex- 
pectation (4.1.2) in the standard way in terms of the joint density f(t, j) defined in Section 
7.2.1. Keeping in mind that the r.v. T is continuous and J is discrete, we write 


m oo 
A = E{cype*"} = 2 ere f(t, Da 
jar? 


where m is the number of possible causes. 
If the benefits are payable at the end of the year of death, then ¥ = K+ 1 = K(x) +1, 
and 


462 8. LIFE INSURANCE MODELS 


To compute the above probability, one can write P(K(x) =k, J = j) = oo f(t, j)dt. 
It is convenient to “condition” the expectation in (4.1.2) with peel to ¥. Consider 
again the case Y = T. In accordance with the general rules (0.7.2.1) and (0.7.2.3), 


A = E{cjre ® } = Ef Efcyre™ |T}} = Ef Efcyr |T }e }. (4.1.3) 


Set 
c(T) = E{cyr | T}. 


This is the mean payment given the failure moment T. When T assumes a value ż, the r.v. 
c(T) takes on the value c(t). Naturally, c(t) = E{cyr|T =t}. 
By (4.1.3), 


A=E{c(T)e 7} = [fw e © fr(t) (4.1.4) 


where fr(t) is the density of T; see also Section 7.2.1. 

Our first goal is to compute c(t). As in Section 7.2.1, let u(t) and u}? (t) denote the total 
force of decrement and the force of decrement due to cause j, respectively. (We omit the 
index x.) Recall that u(t) = X- u(t), and 


P(J = j|T =t) = u” Oy ie) (4.1.5) 
Then O) 

m m J 

= = u(t) 


EXAMPLE 1. A whole life insurance for a particular type of client pays 20 units if death 
comes from natural causes and 10 units if it is a result of an accident. The respective hazard 
rates are u(” (t) = 0.03 and u®) (t) = 0.01. (The age x is suppressed in the notation.) Find 
the APV for ô = 0.05. 

Thus, cı:=20 and cz;=10. The total mortality force is u(t)=u™® (t)+u® (t)=0.04, so T 
is exponentially distributed with parameter u=0.04. By (4.1.5), P(J=1 | T=t)=0.75, and 


by (4.1.6), c(t) = 20-0.75 + 10-0.25 = 17.5. In view of (4.1.4), A = 17:5 f e® fr (t)dt = 
0 


4 70 
17.5Mr(—ò) = 17.5—. = 17. 
5Mr(—8) = F 3 5- ee 


To come to another convenient representation, recall that f(t) = u(t),p, where »p = 
P(T >t). (See (7.1.2.4); we have omitted x.) Then, by virtue of (4.1.6), 


B m pls) (t) 7 m l 
t) = den è Po = p Leni 


Substituting it into (4.1.4), we have 


A= f È ont) e © pdt. (4.1.7) 
j=1 
0 


4. Multiple Decrement and Multiple Life Models 463 


EXAMPLE 2. A special whole life insurance on (50) pays 2 units if death is a result of 
an accident, and the tripled current annual income if death comes from other causes. The 
force of mortality in the latter case is u” (t) = a: The hazard rate for the time of the 
accident is u2)(t) = u = 0.01. Suppose that the income grows as e®°”' until the time t = 20, 
and after this time it remains constant. The rate 6 = 0.05. Find the APV. 

To find the distribution of T (50), we can use general formulas from 7.2.1, but it is more 
convenient to appeal to the scheme of Section 7.2.2. 

In Example 7.2.2-3, we have found that for the life time T (x) under consideration, ; ps9 = 
(1—t/50)e™ fort < 50. 

The payoff function cj, = 3 min{e®0% , 997201 — 3 min{e?-", e04}, while cx, =2. Hence, 
by (4.1.7), 


50 

1 

A= f (3 min{e?", er pee +24) e™ (1—1/50)e “dt 
0 


50 50 
1 
= famine, T a e™dt+ [ve (1—1/50)e “dt. 
0 0 


Both integrals may be computed directly, but nowadays one may use software. In any case, 
A x 1.428. 


EXAMPLE 3. The company where Mr. T. is working pays a special death benefit of 
$5000 times the number of years in service. The benefit is paid at the end of the year of 
death, provided it occurs while Mr. T. is still in service. A non-complete year is counted. 
Mr. T. is now exactly 61. He has been in service for 10 years prior to the present time. 
During the period under consideration, Mr. T. can withdraw from this insurance plan (say, 
changing his job or retiring). The information on the forces of decrement for the initial 
group of /6; = 100 people of age 61 is given in the first four columns of the table below; 
v = 0.97. We consider death as the cause 1, and withdrawal as cause 2; k is the age at the 
beginning of the corresponding period. 


0.221184 
0.03 | 0.34504704 
0 JO 


The third and fourth columns contain the number of people who left the group in the 
period k for the first and the second cause, respectively. In particular, from the table it 
follows that once Mr. T. attains the age of 64, he will immediately retire. Let, for a moment, 
$5000 be a unit of money. If Mr. T. survives k complete years (K = k), then the payment 
will be cy 441 = 10 +k — 60 = k — 50, co, =0. Set rg = P(K =k, J = 1). Then rg = 

kk di) (1) 
k k 


ie ee which is reflected in the sixth column. If the payment is provided in the 
61 lk 61 


464 8. LIFE INSURANCE MODELS 


first year, the discount is v; for the year k, the discount is vk-60. Thus, the APV per $1 
is Eee 1 vOe kirk; see the seventh column. The sum of the numbers in this column is 
0.67183104, so the APV is $5000 - 0.67183104 ~ $3,359.16. 


4.2 Multiple life insurance 


Consider, for example, the last-survivor status of two lives, x: y, and an insurance that 
provides for payment of some benefits at the moment of the last death. Following the logic 
of the traditional notation, it is natural to denote the APV in this case by Axy. It would have 
been very nice if we had been able to reduce the calculation of Ay; to the calculation of the 
“separate” APVs, A, and A,. However, as we will see, as a rule it is impossible. So, the 
straightforward approach may turn out to be optimal: find the distribution of the lifetime 
Tzy of the status, treat it as one life, and compute the corresponding APV. 


EXAMPLE 1. Let us revisit Example 7.3.2-1. We found there that the density frœy (t) = 


Cy (2t — Kt), where Cp = 6/(6—k), and the parameter k < 1 characterizes the dependence 
between the two lifetimes, 7, and Tz. For k = 0 the lifetimes are independent, and the larger 
k is, the stronger the dependence between 7; and 7. To make it more interesting, consider 
growing benefits; for example, assume c; = t. Then the corresponding notation should be 
([A) x5, but to make it simpler, we will write just Axy. 

Let 6 = 0.1. (Since the whole time period under consideration is one, it is natural to 
assume that the unit of time is not one year, and hence it is natural to take a larger interest 
rate than a usual annual interest rate.) We have 


4 —ot i —0.1t 2°? 
=f te frod =x f te (21-1) ar 


One may compute this integral directly (by parts) or use software to obtain that, taking 
constants up to the second digit, 


_ (3.72 —0.72k) 


a 


The last function is slowly decreasing in k from 0.62 to 0.5968, so to the larger extent the 
lives are dependent, the less the APV. 


As in Section 7.3, denote by T, and T; the separate lifetimes in a status. Note that for any 
function (t), 


o(min{ 7; , T>}) + o(max{T; ,7>}) = (Tı) +D). (4.2.1) 


This simple observation leads to a nice connection between the insurances on the last- 
survivor and the joint-life status. Assume that the present value of an insurance depends— 
perhaps in a complicated way—only on the lifetime T of the status and is represented 
by a function ọ¢(T). For example, if we consider an insurance paying c; at time t, then 
olt) = ce ™. 

The APV of an insurance that is specified by o(f) is E{(T)}. Then, by virtue of (4.2.1), 


4. Multiple Decrement and Multiple Life Models 465 


Axy +Agy =A, +A), (4.2.2) 


where Ax:y, Axy, Ax, and A, are the APVs of the same insurance applied to the statuses x : y, 
x-y, and separate lives (x), (y), respectively. 

Certainly, a similar formula is true for the discrete time case. Because the function 
(t) is arbitrary, the relation (4.2.2) concerns all types of insurances we considered earlier 
(temporal, deferred, etc.). 


EXAMPLE 2 ([158, N13]>. You are given the following information about two poli- 
cyholders who are age x and age y, respectively: (a) The future lifetimes of (x) and (y) 
are independent; (b) The force of mortality is constant, with u, = 0.03 and uy = 0.08; (c) 
5 = 0.05; (d) A fully continuous last survivor insurance on (x) and (y) pays a benefit of 
100,000. 

Calculate the actuarial present value of this insurance benefit. 

First, we recollect that for the continuous-time exponential case,A, = pret see (1.1.8). 

Furthermore, in the independency case, the mortality force of T(x : y) is the sum of the 
mortality forces for T(x) and T(y). Thus, T (x : y) is an exponential r.v. with the mortality 
force ux + uy. (For more detail, see, for instance, Example 7.3.2-6.) 

Keeping all of this in mind and proceeding from (4.2.2), we have 


Ux Uy Ux + My 
Ags =Ay +A, —Ayy = + 
D O Ue $8 yt 8 pe toy + 
0.03 0.08 0.11 


= zx 0.30288. 
0.03 + 0.05 z 0.08+0.05 0.011 +0.05 0:0208 


Multiplying this by 100,000, we get $30,288. 

EXAMPLE 3. Return to Example 7.3.2-2. Consider a whole life insurance with the 
payment c; = t. Set & = 0.04. We should again use the symbol (/A)50.60, but for simplicity 
we will write just A. 

The r.v. T, is uniform on [0,50], and T) is exponential with u = 0.05. Since the life- 
times are independent, by virtue of (7.3.2.2) or (7.3.2.5), the survival function for the 
min{T1, T2} is 


Pxy = Px Py = A= t/50)e 7 fort < 50, and = 0, otherwise. 


The d.f. -¢x:y = 1 — px, and the density of T,.y is 


f= is =- t Pxy = (0.07 —0.001r)e~ °° for t < 50, and =0, otherwise. 
Then for the joint-life status, 
o0 50 
As0:60 = f te™ f(t)dt = f te © (0.07 —0.001t)e dt 
0 0 
50 
= ri t(0.07 — 0.0011 )e~° "dt ~ 5.847. 
0 


Reprinted with permission of the Casualty Actuarial Society. 


466 8. LIFE INSURANCE MODELS 


(Again, we can integrate by parts or use software.) 
Next, since T, is uniformly distributed on [0,50], 


A f te -è 1 dt f g -0.04 | dt x 7.425 
E e “—dt = —dt ~ 7.425. 
A 50 Roe 50 


For the exponentially distributed r.v. T2, we have 


ene f teue tdt =0.05 | te dt ~ 6.173. 
0 0 


Then, by (4.2.2), 


Axe = 450 +460 —As50:60 © 7.751. 


The reader can check the calculations above, but at least we got Az greater than A50.60. 
Would it be true if the benefits had been level? See Exercise 52. 


EXAMPLE 4. (a) The lifetimes 7; and T) of a husband and wife are independent and 
uniformly distributed on [0,50]; the discount factor is 0.96. A special insurance pays one 
unit upon the death of the husband, provided that he dies first. Find the APV and the 
variance of the present value. 

First, to make formulas nicer, we consider 50 years as a unit of time, replacing [0,50] 
by [0,1]. Denote by 6 the interest rate corresponding to this new time scale. The present 
value Z takes on a non-zero value e~% if T, gets into the interval [t,t + dt] and T> > t. The 
probability of the product of these events is 1 -dt - P(T> >t) = (1 —t)dt. Then 

: 1 
E{Z} = i e¥ (1 1dr = = (eF-148), (4.2.3) 
which may be obtained, for example, by integration by parts. 

The discount factor over 50 years is v = (0.96)°°, so 8 = — In v = —501n(0.96) ~ 2.0411. 
Inserting it into (4.2.3), we obtain the APV A = E{Z} ~ 0.2811. 

By the double rate rule, 


1 
ED) = hs Gg ae +28) ~ 0.1850, 


and Var{Z} ~ 0.1850 — (0.2811)? æ 0.1050. 


(b) Let an insurance pay c units at the moment of the husband’s death if he dies first, and 
c units if he dies after his wife. Say, c = land € = 1/2. 

Denote by Z the present value of the insurance in the previous case (a), and by Z; the 
present value of an individual life insurance for the husband. Then the present value of the 
insurance under consideration is the r.v. Z2 = CZ; + (c—c)Z (since Z = 0 if T; > Th). Next, 
E{Z\} = fo e™™dt = } (1—e~>) ~ 0.4263, and 


re 1 1 
E{Z2} =E {Z1} + (c —)E{Z} ~ 5 0.2811 + 5 -0.4263 = 0.3537. 


We will continue to consider this problem in Exercises 56-57. 


5. On the Actuarial Notation 467 


5 ON THE ACTUARIAL NOTATION 


The reader certainly has grasped the logic of the actuarial notation we used above. It is 
traditional and guided and revised by the International Actuarial Association’s Permanent 
Committee on Notation. Some general features of the notation system may be seen in Fig.2. 


The “type of time” 


The doubled 
interest rate 


| The type of the contract | 


2A? 


xn 


The length of the term 


Continuously 
decreasing 


Continuously 
increasing 


(IA), IA),, (DA),, (DA), 


Discretely Discretely 
increasing decreasing 


FIGURE 2. 


6 EXERCISES 


The use of software for computing integrals or sums in problems below is recommended. 


Sections 1 and 2 
1. (a) Show that obviously A, < 1 and A, < 1. Can we write for one of these characteristics 
a more precise bound? 


(b) Usually, the life insurance for a younger person costs less than for an older person. 
Why? Is it true for 8 = 0? Let 6 > 0. Is it always true that A,,, > A, andA,4, >A, ? 
Give, first, a heuristic argument, and then justify your answer mathematically. (Hint: 
Think about “dangerous ages”. Is u(x) monotone?) 


2. (a) Can we always expect that A, <A,? (b) Can Ay =A,? (c) When is A, close to Ax? 


3. Compute Var{Z} in Example 1.1-2. 


468 


10. 


11. 


12. 
13. 


14. 


15. 


16. 
17. 


8. LIFE INSURANCE MODELS 


. Under the assumption u(x) = u, 


7 _ H Lgs Aree 
Ax = 8 Ay =e ee (6.1) 
The first formula was obtained in (1.1.8). Prove the second. Recalling that e* = 1+x+ 
o(x) for small x, compare the two formulas in (6.1) for small u and 6. (Advice: First, look 
up Exercise 7-28. Then either derive (6.1) directly, or write the m.g.f.’s of K(x) following 
(0.4.3.1).) 


. In the situation of Exercise 4, write formulas for the variance for both cases. (Hint: You do 


not have to compute anything—the answer is almost immediate.) 


. Assume that the force of mortality is constant and A, = 0.5. Find the variance of the present 


value for this type of insurance. 

(a) Let the density frx (t) = 2(a—t)/a? for t € [0,a]. Graph fro (t) for all t. Show that 
2 2 
ae zg! —e~®), (Advice: Compute the integral [“e~® [2(a— 
t) /a?|dt using integration by parts, writing it as fé [2(a — t) /a?]d(—e™ /8).) 


in this case, Ay = 


(b) Estimate Aoo in the situation of Exercise 7-35 for 5 = 0.04. 


. Using the data from Table 7-1.5.1-1, provide an Excel worksheet for computing A, for x = 


60, ...,89 in the situation of Example 2.1.2-2. Analyze the tendency of the change of A, as x 
is varying. 


. For a certain population, we have A30 = 0.2 if 6 = 3%, and A30 = 0.09 if 5 = 6%. One hun- 


dred people at an age of 30 from this population bought the whole life insurance. Proceeding 
from ô = 3%, find a single premium sufficient for the company to fulfill its obligation with a 
probability of 0.99. 


Consider n = 100 whole life insurance contracts with a benefit of $100,000. Let all lives be 
independent and distributed as in Example 1.1-1, and 5 = 0.04. Find the total single premium 
sufficient for the company to carry out all payments with probability 95%. 


Given a rate 6, find the distribution of the present value Z in the case of the whole life 
insurance (WY = T) under the assumption that the force of mortality is a constant u. When 
is Z distributed uniformly on [0,1]? (Hint: P(Z < x) = P(e~*” < x) = P(T > —(Inx)/8) = 
1 — Fy (—(Inx) /8), where Fy (x) is the d.f. of T.) 


Solve the problem from Example 2.2.1-1 by using (2.2.2). 


Give an example of a particular distribution for which the approximation formula (2.1.7) is 
precise. Show it directly. Is this the only possible example? 


(a) Find 10/450 if X is uniformly distributed on [0,100] and 5 = 0.04. (Advice: You can 
use the results of the calculations in Example 1.1-1 and (2.2.3).) 


(b) Find j9)Aso in the same situation. (Hint: The distribution is uniform everywhere, in- 
cluding the intervals between complete years.) 


Following just from the corresponding definitions, give a heuristic explanation of the formu- 
— =] 

las Axm =A t+ nEx and Aym=Abgt nEx. 

Which is larger: A, or Ay-7 ? 


Reasoning similarly to what we did when proving (2.1.6), show that (2.3.3) is almost obvious 
from a heuristic point of view. 


18. 


19. 
20. 


21: 


22. 


23. 


24. 


25. 
26. 
27. 


28. 
29. 


6. Exercises 469 


(a) Consider two types of insurances, the n-year-term and the n-year-deferred, under the 
same conditions (the size of benefits, the discount, and the survival function). Are the 
corresponding present values positively or negatively correlated, or neither? 


(b) Answer the same question for the n-year-term and the n-year-pure-endowment insur- 
ances. 


(c) In the situation of Problem 18b, assume that you know the actuarial values and the 
variances for the insurances mentioned. How would you calculate the actuarial value 
and the variance for the n-year-endowment insurance? 


Let Ay =u, Ax4n = S, Axm = w. Find Alp. 

For a certain population we have 3530 = 0.95, and for the rate 
56=3% : A30 = 0.2, A65 = 0.44, while for 

5=6% : A30 =0.09, Ags = 0.235. 


(a) Proceeding from the 3% rate, find A30.35>A39.35° 35430. 


(b) In the case of 1000 clients and the 35-year endowment insurance, find the single pre- 
miums sufficient for the company to fulfill its obligations with a probability of 0.95. 


An actuary uses a demographic model where the lifetime of 30% of newborns has a constant 
mortality rate of 1/50 year~!, while for 70%, the rate is 1/80 year™!. The interest rate equals 
5%. Under the above assumption, compute the expectation and the variance of the present 
value of the whole life insurance for a life-age-20 payable at the moment of death. (Advice: 
Use the result of Exercise 7-22.) 


You figured out somehow that an insurance company estimates the net single premium (APV) 
for the whole life insurance which pays $100,000 at the end of the year of death for 30- 
year-old people as $25,000; the same for 50-year-old people is $40,000, and the 20-years- 
endowment insurance for 30-year-old people with the same benefit is $55,000. You also 
know that the probability for a 30-year-old person to live 20 years more is about 0.95. From 
what average rate of interest did the actuary of the company proceed? 


Take the values of l, for x = 60,...,70 from Table 7-1.5.1-1, and provide an Excel worksheet 
to compute Alora similarly to what we did in Example 2.3.2-2. Compute the variance using 
the same worksheet. For 100 analogous contracts, find a single premium per contract for the 


security level B = 0.95. 

For the case of De Moivre’s law with œ = 100 and discount 6 = 0.04, find Aso and 
A5020] 

Prove (2.4.2) by using the rule of double rate (1.1.5). 

Using Table 7-1.5.1-1, find 39£39 for 6 = 0.05. 

(iii) Ay or 


Fi 
xn)? 


In each of the following, tell which quantity is larger: (i) A, or Ax; (ii) Ay or A 
Ax:m; (iV) Ax OF mjAx- 


Find limmo m|Ax and liMm— o mx- Explain why the answer remains true even if 6 = 0. 


Consider two groups of clients of the same age x. In each group, the distribution of T(x) is 
the same for all clients. However, if T® (x) and T®) (x) are the future lifetimes for clients 
from the first and from the second group, respectively, then P(T® (x) > t) > P(T® (x) > t) 
for all t. Which group is healthier? Give a heuristic argument and then show rigorously that 
the APV for one group is larger than for the other, for any insurance we considered. Which 


470 


30. 
31. 


32. 
33. 


34. 


* 


35. 


36. 
37. 


38. 


39. 


8. LIFE INSURANCE MODELS 


group is it? The reader who read Section 1.3.5.1 recognizes that we are talking about the 
first stochastic dominance, but we do not need to know this notion in order to answer the 
above question. The reader who did not skip Section 1.3.5.2 is recommended to connect this 
problem with the notion of the FSD. (Hint: Heuristically it is almost obvious; for a rigorous 
proof one may use (0.2.2.1).) 


Prove that „Ax =A, -Al and „Ax = Ax — AL 


Which of the following formulas are correct? 
(a) Ay=Acmtv" WrAciny (b) Ax = Art nPxAxtn 


Give a heuristic and rigorous proofs of (2.4.5) and (2.4.7). 


The characteristics Ay, Ay, Ax7 Axm, A! 


x:n) 


A miAx, miAxs and „Ex all depend on ò. 


(a) Which of them—under a very mild condition—are decreasing in 6 ? 


(b) For each APV, write without calculations the limit as 5 — 0. (Hint: If 6 = 0, the present 
value of $1 to be paid in the future is $1.) 


(c) Make sure that the same limits follow from the corresponding mathematical formulas 
for the expectations. 
Do we underestimate or overestimate the APV A = E {e ®t] if, instead of computing A, we 
replace the r.v. ¥ by its expected value E{¥} in Z = e 882 In particular, state which is larger: 
A, or exp{—8(e, + 1)}; and A, or exp{—8 ex}. (Advice: Appeal to Jensen’s inequality from 
Section 1.3.4.2.) 


Denote by A(5) the APV for an insurance with a unit benefit paid at some random time Y 
and for a rate of 5. Using the result of Exercise 1-33, prove that, if 6; < 6 < 9, then 


[A(81)]°/>! < A(8) < [A(82)]°/*. (6.2) 


In the situation of Example 1.1-1 for @ = 100 and x = 60, compute the exact values of the 
APVs for 8; = 0.04,6 = 0.045, and 52 = 0.05, and analyze how much A(8) differs from the 
left and right members of (6.2). 


Section 3* 
Does the rule of double rate work in the case of varying payments? 


Find the APV for the whole life insurance with c; = e% , and u(x) =u. When does the problem 
have a solution? When is the insurance in this case equivalent to the whole life insurance with 
a level benefit but with another interest rate? 


An insurance provides for payment of c; = e°°"’ for clients with the constant force of mortal- 
ity uı = 0.02. Another insurance provides for payment of one unit for another type of clients 
with a constant force of mortality u2. It has turned out that the net single premiums for both 
insurances are the same. For which 6 is it possible? Given ð, find u2. 


Let the future life time T(x) be uniformly distributed on [0,n]. Figure out without any cal- 
culations, which is larger: (DA)xm or (A) x. Explain why your answer is not true for all 
distributions of T(x). (Hint: (a) When the benefit is increasing in time, larger values of the 
benefit correspond to the moments of time at which the present value of a unit to be paid is 
getting smaller. (b) Regarding the second question, it suffices to give an example, and this 
example may concern a r.v. which is “practically non-random” .) 


40. 


41. 


42. 


43. 


44. 


45. 


46. 


47. 


48.6 


6. Exercises 471 
(a) Given ò, find (TA)6o (i) in the case when X is uniform on [0,100]; (ii) when the force 
of mortality u(x) = u. (iii) In both cases, find the variances for 6 = 0.04, and u = 0.05. 


(b) For the cases (i), (ii) above, find (DA) 60.30; 


Compute the APV for the 10-year-term and the 10-year-endowment insurances for the age 
of 50. The benefit is payable at the end of the year of death and equals 50 + 5m if the insured 
dies within the mth year. In the endowment case, 100 units are paid if the insured attains the 
age of 60. Survival probabilities may be taken from Table 7-1.5.1-1. Certainly, the best way 
is to use software. 


Solve the problem of Example 3.2-1 under the additional assumption that if the insured does 
not find a job during a year, then the company pays the annual salary at the end of the year 
period after the failure, and the contract terminates. 


A special whole life insurance policy on a life-age of 30 has a benefit that starts at 10 and 
increases by 2 per year until age 50. Starting with the age of 50, the benefit is level at 50. At 
the age of 60, the contract terminates. Find the APV in the case of discrete time and discount 
v = 0.97 if the density of T (30) is constant on [0,30], and 39p39 = 0.96. (Advice: There are 
formulas for the sum at which you will arrive—see, e.g., (9.4.4), but it is reasonable just to 
use software.) 


Compute (IA)70 if T is uniformly distributed on [0,30] and v = 0.96. (Advice: There are 
formulas for the sum at which you will arrive—see, e.g., (9.4.4), but it is reasonable just to 
use software.) 


Explain the significance of the notation below and the formula itself: 


n—1 n—1 


(TA) ta = (+ Dw P(K =k) = Y (kt 1)! ep dete. 
k=0 k=0 


A 4-year endowment insurance with benefits payable at the end of the year of death of (x) 
is characterized by the table below; k stands for values of the curtate lifetime. Find the APV 
for v = 0.95 and the probability that the present value will exceed the APV. 


k k+1Px benefit cx 41 
0 0.99 5 
1 0.97 4 
2 0.94 2 
3 0.90 1 


Section 4* 


In the situation of Example 4.1-1, in the case of non-accidental death, the company returns 
the net single premium with a rate of 2%. In the case of an accidental death, the benefit is 
equal to 10. Find the APV. (Hint: You will come to an equation for A.) 


A multiple decrement model has two causes with u” (t) = 0.2r, u (t) = 0.82 in some units. 
The benefit cj = 10j, j = 1,2. (a) Write a precise formula for ;p = P(T >t). (b) Write a 
precise value for A. 


The nice idea of this exercise is borrowed from [43, C7-5]. 


472 


49. 


50. 


51. 


52; 
53: 


54. 
55. 


56. 


Of 


58. 


8. LIFE INSURANCE MODELS 


In the situation of Example 7.2.1-1, let cy, = t, co, = t°, and § = 0.06. Write A, in integral 
form. Using software, estimate the value. 


In the situation of Exercise 7-49, let cy, = 1, Ca = 2,C3t oat ae 5 = 0.06. Write A, in 
integral form. Using software, estimate the value. 


This problem is close to that of Example 4.1-3. A company has a special plan which pays 
(a) a death benefit of $5000 times the number of years in service, paid at the end of the year 
of death, provided it occurs while the employee is still in service; (b) a one-time payment 
(above the retirement pension) of $2000 times the number of years in service, paid at the end 
of the year of retirement either to the employee or to her/his beneficiary. Mr. T. is now exactly 
61 years old. His salary increases by 5% every year on his birthday, and his annual salary 
now is $60,000. He has been in service for 10 years prior to the present time. During the 
period under consideration, Mr. T. can withdraw from this insurance plan (say, by changing 
his job). We consider death as cause 1, withdrawal as cause 2, and retirement as cause 3. The 
information on the forces of decrement for the initial group of /6; = 100 people of age 61 is 
given in the table below; v = 0.97. Provide a spreadsheet for computing the APV. (Advice: 
Look up, first, Section 7.2.3.) 


1 2 3 
|E E [P 


6l 0.02 0.01 0.05 
62 0.03 0.00 0.06 
63 0.04 0.00 0.07 
64 0.00 0.00 1.00 


In the case of level benefit, which is larger: Ay. or Axy? 


In the situation of Example 7.3.1-1, for k = 1, 6=0.1, and c; = 1, find Ay and the cor- 
responding variance. (Advice: You can do it directly, or you may use Example 4.2-1 and 
(4.2.2). All integrals that appear in this problem are tractable, but the use of software is more 
expeditious and is highly recommended.) 


In the situation of Example 4.2-1, find (7A);-y for k = 1 and the corresponding variance. 


In the situation of Example 7.3.1-1, find the APV for the insurance providing for payment of 
2 units on the first death and one unit on the second. Let = 0.1 and, for simplicity, k = 1. 
Integrals which will arise are tractable but tedious, so either just write them down or use 
software. 


Consider the problem of Example 4.2-4a when (i) Tı and 7> are exponentially distributed 
with the same parameter a; (ii) Tı is uniform on [0,40], and 77 —on [0.50]. 


In the situation of Example 4.2-4a, find the APV and the variance of the present value for the 
following insurances. 


(a) An insurance providing payments of one unit upon each death. (Hint: You may reduce 
the problem to a very simple one.) 

(b) An insurance providing payment of two units upon the first death and one unit upon 
the second. 

(c) If the husband dies first, the insurance pays cı units upon the first death, and c2 units 
upon the second. If the wife dies first, the insurance pays c3 units upon the first death, 
and c4 units upon the second. 


Consider a status of two lives. Write a simple formula for the APV for the insurance paying 
one unit upon both deaths. 


Chapter 9 


Annuity Models 


An annuity is a series of payments made at certain intervals (as months or years) during 
some period which is, as a rule, random. Typical examples are pensions which are life 
annuities paid while the retired person lives, or an alimony which is paid until one of 
spouses dies. 

As a good (and often convenient) approximation, actuaries use also models where annu- 
ity payments are carried out in a continuous-time fashion. 

Regular payment of premiums by an insured, say, in the case of life insurance, is also an 
annuity. In this case, the annuitant—that is, the party receiving the annuity—is the insur- 
ance company, while the single payment of benefits constitutes the losses of the company. 

Accumulated values, though they are strongly connected with annuities, will be consid- 
ered in Section 10.1.3 after we consider the notion of a net premium rate. 

Below, we systematically explore two models: continuous-time and those where pay- 
ments are provided at the beginning of certain periods, that is, in a discrete way. 


1 TWO APPROACHES TO THE EVALUATION OF ANNUITIES 
1.1 Continuous annuities 


For certainty, we adopt a year as a time unit. 

Consider an annuity that is payable continuously at a rate c; depending, in general, on 
time. More precisely, we assume that the payment during an infinitesimally small interval 
[t,t + dt] is equal to c,dt. One may compare it with a water flow pouring into a basin with 
an instant speed of c; at time f. 

The present value of such a payment is equal to v'c,dt = e*'c,dt, where v = e~° and ô 
is an annual rate of interest. Hence, if the payments are made during a time interval [0, ¥], 
then the present value of the total payment is equal to 


b4 
ra e™cdt. (1.1.1) 
0 


Certainly, this is an abstraction, but it may serve as a good approximation if payments are 
carried out sufficiently frequently, say, monthly. (We still keep a year as a time unit.) 
Usually, ¥ is random. In the case of life annuities on (x), the r.v. ¥ may coincide with 
the future lifetime T(x) or may differ from it. 
The notation for the APV (or the net single premium) E{Y } is @, with indices and signs 
when it is needed. 


473 


474 9. ANNUITY MODELS 


EXAMPLE 1. Consider an annuity on a life-age-x with Y = T = T(x). Assume for 
simplicity that the force of mortality is a constant u. 
(a) First, let the payment rate be constant; say, c; = 1. Then 


T 1 
Y af e “dt = -(1— e). (1.1.2) 
0 ò 


Since T is an exponential r.v. with parameter u, the expectation E{e~*7} = u/ (u+ 8) (see, 
e.g., (8.1.1.8)), and 


PEUS 5 (=m) T 5 (1 £) = ae 


(b) Now consider a linearly growing payment rate; for example, c; = t. Then 


E atte, A ett A -òT 
y= [tc dt =—5Te T+ (1-e*), 


which may be obtained by integration by parts. For the net single premium, we have 


-1l -êr l -òT 
a=—E{Te je (1-Efe ae (1.1.3) 


The first term E{Te~®"} = jk te "ue dt = u/(u+8)’, as is easy to calculate integrating 
0 


again by parts. Substituting it into (1.1.3), replacing E{e~®" } by u/(u+8), and doing some 
algebra, we get that 


a@=1/(ut+8)°. (1.1.4) 


We see that if c; is not constant, calculations turn out to be somewhat tedious even in 
such a simple case as in Example 1b. The approach we consider next may be helpful in 
computing APVs, and it is based on the following representation. 

As usual, denote by 1, the indicator of an event A; more precisely, 14 = 1 if A occurs, 
and 14 = 0 otherwise. Note that E{1,4} = P(A). Then we can rewrite (1.1.1) as 


Y= i e eA pps dt. (1.1.5) 
A > 


(Indeed, for t < ¥ the integrand is the same as in (1.1.1), and for t > ¥ the indicator equals 
zero, so as a matter of fact, we integrate over [0, ¥].) 

Taking the expectation of both sides and passing the expectation operation through the 
integral, we come to 


oo 


ā=E{Y} = f e cE pysy jdt = f ec, P(W > t)dt. (1.1.6) 


If ¥ = T (x), then P(¥ > t) = P(T (x) > t) = px. (Since T is a continuous r.v., it does 
not matter whether we write T > t or T > t.) Denoting by a, the APV of annuities in this 
case, we have 


ay = f ec,» pdt. (1.1.7) 
0 


1. Two Approaches to the Evaluation of Annuities 475 


(Usually, the symbol a, stands for the APV in the case of a constant payment rate; see 
Section 3.1. We keep the same symbol here to avoid complicated notation.) 


EXAMPLE 2. Let us revisit Example 1b. In this case, ;p, =e“, and by (1.1.7), 
ay = f ete HM dt = f te 8) ge (1.1.8) 
0 0 
The last integral is standard and equals 1/(u+6)*, which coincides with (1.1.4). (The 


variable change s = (u+68)t leads to (u+8)~? [> se~°ds. The last integral equals one.) We 
see that calculations turned out to be a bit easier than what we did in Example 1b. 


Nevertheless, the advantages of the second approach should not be overestimated. For- 
mula (1.1.6) is indeed useful in computing expectations. But, for example, in calculating 
variances, representation (1.1.5) is not so helpful. [See, however, how one can apply this 
technique in computing second moments in (7.1.2.11)]. 

The approach we used in Example 1 is called the aggregate payment technique. The 
alternative approach is referred to as the current payment technique. We will use both— 
whichever turns out to be more convenient. 


1.2 Discrete annuities 


The model is very similar to what we considered above. Let ¥ be an integer-valued r.v. 
Consider ¥ time intervals of a unit length, say, ¥ complete years. Denote by c; the payment 
at an integer time f. If the first payment is made at the initial time t = 0, the second payment 
is made at the beginning of the second period (that is, at time t = 1), and so on, then the 
present value of the total payment during ¥ periods is equal to 


Y=cotcwtov +... + eyvi. (1.2.1) 


Such annuities are called annuities-due. 
Now consider ¥ intervals, and assume that payments are provided at the end of each 
interval. Then the present value of the total payment is 


Y = vC] gsr + sik Hepy”. 


This type is called annuities-immediate or payable in arrears. 

In both cases, the r.v.’ s Y and ¥ coincides with the number of payments. We use different 
symbols to emphasize that, for the same lifetime and for the same period of payments, Y 
and ¥ are different. 

For example, consider, a life annuity on (x) providing for payments until the death of the 
annuitant. For the annuities-due, the last payment is made at the beginning of the year of 
death—that is, at time t = K, where K = K(x) is the curtate lifetime. Then the number of 
payments ¥ = K + 1, and the present value 


Y=cotcwtow +... + cpv”. (1.2.2) 


In the case of an annuity-immediate, the last payment is provided at the same time t = K 
since at the end of the year of death, the company will not pay. In this case, the number of 


476 9. ANNUITY MODELS 


intervals (or, equivalently, the number of payments) equals K. So, W = K, and the present 
value 7 2 
Y =cytow +... ¢cxvs if K >0, and Y =0 if K =0. 


We see that there is a simple relation between Y and Y: 
Y = co + Y. 


For this reason, it suffices, at least theoretically, to study just one type — for example, 
annuities-due. 

If we somehow manage to write a good expression for the sum in (1.2.1), then we can 
find the distribution of Y and its moments. In the following sections, we will demonstrate 
that this is quite easy if c; is constant. 

However, in general, such a summation may turn out to be difficult or even analytically 
impossible. In this case, at least for computing APVs, the current payment technique may 
help. 

Indeed, similar to what we did in Section 1.1, we can rewrite (1.2.1) as 


Y = Yew 1pp_isy- (1.2.3) 
t=0 
The traditional notation for the APV of annuities-due is ä (with indices when needed). 
From (1.2.3) it follows that 


ä=E{Y} = F cv'P(W >t+1). (1.2.4) 
t=0 


Consider a life annuity-due for (x), and denote the corresponding APV by ä,. Like in 
the case of continuous time, äy stands usually for the APV in the case of a level (constant) 
payment rate (see Section 3.1). We keep the same symbol here to make the notation simpler. 

We saw that for a life annuity-due, the rv. ¥ = K(x) +1. Replacing t by k for further 
convenience, we get from (1.2.4) that 


dy = oh ce*P(K >k)= Ł cyv* kPx- (1.2.5) 
k=0 k=0 


EXAMPLE 1. (a) An organization provides annual payments to a 20 year old person for 
six years (for example, until the person will study in a university). Payments are made at 
the beginning of each year; 20 units in the each of the first two years, 25 in the third year, 
30 in the fourth year, and 35 in each of the remaining years. The values of /o9+, are given in 
the spreadsheets in Fig.1ab; v = 0.96. Find the net single premium and the corresponding 
variance. 

In our problem, c = 0 for k > 5, and, by (1.2.5), to compute the APV we should compute 
V cyv* «P20. It is convenient to do so using a worksheet. 

In Column C in Fig.la, we compute p20 = l20+4ķ/l20. Column D contains values of 
payments. In Column E, we compute the products cpv“ KP20. Then the APV ä20 equals the 
sum of the values in Column E. We see that this is a very easy procedure. 


1. Two Approaches to the Evaluation of Annuities 477 


kP 20 


1 
0.99953226 19.1910194 
0.99801718 22.9943159 
0.99647161 26.4484291 
0.9928822 29.5155378 
0.98917078 28.2289996 


146.378302 


k 
kP 20°V 


0.0004677 0.009355 
0.0015151 19.2 - - 0.059391 
0.0015456 23.04 f i 0.096197 
0.0035894 26.54208 |88.78208 |7882.258 | 0.318675 
0.0037114 29.72713 |118.5092 |14044.43 | 0.439838 
0.9891708 28.53804 |147.0473 |21622.89 | 145.4548 


AA 
x Variance >| [(PV): P(K=k) 


146.3783 


PV)? P(K=k) |_| 
51.048494 Q 096 ad PRY 


(b) A worksheet for Example 1b; the aggregate payment technique. 
PV stands for ‘present value’. 


FIGURE 1. 


(b) In the worksheet in Fig.1b, we apply the aggregate payment technique. In Column C, 
we compute P(K = k) = (look — /204441)/l20 with one exception: in Cell C7 we compute 
P(K > 5) = ls /l. This is the probability that the annuitant will attain the age of 25 and 
will receive the last payment. In column E, we compute czv*, and in Column F — the values 
of the present value in the cases K = k for k = 0,...,4. Cell F7 corresponds to the case 
K > 5. The command, say, for Cell F5 is ‘=SUM($E$2:E5)’, which corresponds to the 
expression in (1.2.2) for K =k. 

Column G contains the squares of the values in Column F. It prepares us for computing 
the variance. In Column H, we multiply each value of the present value by its probability. 
For example, Cell HS=F5*C5. In Column I, we do the same with the squares. For example, 
15=G5*C5. 

Now, to compute the APV, it suffices to add up all numbers in Column H, which is done 
in Cell C11. The sum of all numbers in Column I is E{Y7}, so to compute the variance in 
C13 we should subtract from this sum the square of the APV. 

We see that the procedure is longer, but it allows us to compute the variance, and in the 
same manner, all other moments. 


478 9. ANNUITY MODELS 


EXAMPLE 2 ({158, N12]'). An insurance company has agreed to make payments to 
a worker who is age x and was injured at work. The payments are 120,000 per year, 
paid annually, starting immediately and continuing for the remainder of the worker’s life. 
After the first 500,000 is paid by the insurance company, the remainder will be paid by a 
reinsurance company. The survival function ;p, = (0.6) for 0 < t < 6.5, and ;p, = 0 for 
t > 6.5. The annual effective interest i = 0.05. Calculate the APV of the payment to be 
made by the reinsurer. 

We use the general presentation (1.2.5). Let $1000 be a monetary unit. Since 120 x 4 < 
500 while 120 x 5 > 500, the first 500 will be paid at the beginning of the fifth year; that is 
at the time moment k = 4. (The payments starts at k = 0.) 

The surplus over 500,000 at this moment is 600 — 500 = 100. Hence, for the reinsurer, 
cg = 0 for k < 4, c4 = 100, and c = 120 for k > 4. 


Also, px = 0 for k > 6, and the discount v = Ww = 15 Hence, by virtue of (1.2.5), 


Bis oop 0.6 \4 Sf 0.6.\" 
ee „=100:( 22) +120. see) 299151, 
g Lo kP. (5) = » (ta) 


2 LEVEL ANNUITIES. A CONNECTION WITH INSURANCE 


In the particular case when payments are constant (or level) during the payment period, 
calculations turn out to be simpler, and the aggregate payment technique leads to nice 
results. Once payments are constant, we can assume, without loss of generality, that they 
are made at a unit rate. So, we set c; = 1 for both the continuous and discrete cases. 


2.1 Certain annuities 


Consider an annuity that is payable continuously at a constant rate of one per unit of time 
during a period [0,7]. As we saw already, the present value of the total payment is equal to 


preva = za err, (2.1.1) 
0 ò 


The traditional notation of the present value in the 1.-h.s. of (2.1.1) is @. The bar indi- 
cates, as usual, that we deal with a continuous time model. The time T is put into the angle 
above in order to indicate that this is the duration of the period under consideration, This 
distinguishes this notation from the notation & which we will use for the expected present 
value of the whole life annuity on an age-life-x. In the latter case, the index is an initial age 
rather than the length of the total payment period. Thus, 


aq = 50 E, (2.1.2) 


TReprinted with permission of the Casualty Actuarial Society. 


2. Level Annuities 479 


For the discrete annuities-due with unit payments at the beginning of each period, the 
present value of the total payment during n periods is equal to 


2 He ee 17y" 
ltv4+v+..4v7" = i ; (2.1.3) 
=v 
The quantity in the 1.-h.s. of (2.1.3) is denoted by äm. Setting d = 1 — v, we write 
e 1 n 
dq =z (lY J (2.1.4) 


For an annuity-immediate, where during the same n intervals payments are provided at 
the end of each interval, the present value of the total payment is denoted by aq and is 


equal to 
n 


v+ +... +v" =v 
l-v 


= véin. (2.1.5) 


Thus, an = vän. 


2.2 Random annuities 


Now, let the payment interval be random. To emphasize this, we replace the symbol t by 
Y which again stands for a r.v. Then in the case of continuous payment, the r.v. 


Y=ay 
is the (random) present value of annuities, and in accordance with (2.1.2), 


Y= 5-32), where Z = e™®*. (2.2.1) 
The main point here is that Z is the present value of an insurance (!) which provides for 
payment of a unit of money at time Y. 
By virtue of (2.2.1), if we know the distribution of Z, then we can readily compute the 
distribution of Y, and vice versa. See also Exercise 6 and the comments included there. 
Let us consider the APVs (or net single premiums). Setting d= E {Y } and A = E {Z}, we 
get from (2.2.1) that 


1 
a= 5( —A), (2.2.2) 
or 
A+éa=1. (2.2.3) 
For the variance, we immediately obtain from (2.2.1) that 


Var{Y } = aVar{Z}. (2.2.4) 


In the discrete case, we have practically the same representation. Let discrete payments 
be provided during a period [0, ¥] where ¥ is an integer-valued r.v. Then ¥ is the number 


of payments. Setting 
Y = äp, 


480 9. ANNUITY MODELS 


the present value of the annuity-due, we derive from (2.1.4) that 


1 
v= qu —Z), where Z = v” =e", (2.2.5) 
The r.v. Z is again the present value of an insurance providing a single payment of one unit 
at time Y. 
The only difference between (2.2.5) and (2.2.1) is in the denominator. In the discrete 


case, it equals d = 1 — v; in the continuous case, it is 6. Note that this difference is not 
2 


very significant since d = 1 — v = 1—e ®=8— ° + 0(8) for small 5; see (0.4.2.6). For 
example, if 8 = 0.05, then d = 1 — e~® ~ 0.04877. 
> Making use of (0.4.2.2), one can show that 
o 8 5? 50° 
< < m 
A 
Setting again ä = E{Y } and A = E{Z}, we get from (2.2.5) the counterparts of (2.2.2)- 
(2.2.4) for the discrete case: 


< 


a= <(1-A), or A+daé=1, (2.2.6) 


and 


Var{Y } = avar{Z}. (2.2.7) 


3 SOME PARTICULAR TYPES OF LEVEL ANNUITIES 
3.1 Whole life annuities 


First, consider the whole life annuity providing continuous payment at a rate of one to a 
life-age-x until death. This means that ¥ = T = T(x). The APV of the annuity is denoted 
by a. From (2.2.2), (2.2.4), and (8.2.1.3) it follows that 

RE TAES 
ay = 5 (1 —Ayx), (3.1.1) 
7A, _ (A)? 


Var{Y} = z 


(3.1.2) 


EXAMPLE 1. Consider the situation of Example 8.1.1-1, where T follows De Moivre’s 


= 
law. In this example, we found that A, = , where s = (œ —x)6. Then, by (3.1.1), 


e *‘—l+s 
òs ` 


dy = 


(3.1.3) 


3. Some Particular Types of Level Annuities 481 


For x = 60, œ = 100, and 6 = 0.04, we have computed in Example 8.1.1-1 that A, ~ 0.499. 
Then, using (3.1.1) directly, we have 


a 9 og | 


Since the payments are carried out at the unit rate per unit of time, the total amount to 


1 — 0.499) = 12.525. 


T (60) 
be paid without discounting is equal to f 1 -dt = T (60). The r.v. T (60) is uniformly 
0 


distributed on [0,40], so E{T } = 20. Thus, on the average, 20 units will be paid. In order 
to provide it, the initial fund should be equal to 12.525 on the average. 
For the variance, we should just divide the expression (8.1.1.7) by 8°. 


In the case of the whole life annuity-due on (x), we denote the APV by d,. In this case, 
W = K +1, and in accordance with (2.2.6) and (2.2.7), 


i= L(A), (3.1.4) 
2A, — (Ax) 


Var{Y} = (3.1.5) 


d2 
EXAMPLE 2. Consider a whole life annuity-due for the same case as in Example 1. To 
this end, we use results from Example 8.1.1-2, where we have calculated that 


v(1—v’") 
rd” 
with r = œ — x. Substituting it into (3.1.4), we immediately get that 


Ay = 


rd 

ay = ee 

In Example 8.1.1-2, for œ = 100,x = 60, and 6 = 0.04, we computed that A, ~ 0.4889. 
Then, by (3.1.4), a, ~ [(1 — 0.4889) /(1 —e~°*)] = 13.035. This is larger than the answer 
in Example 1. (Why? See also Exercise 15.) 


u 
+8 


EXAMPLE 3. Let u(x) =p. Then, as was obtained in (8.1.1.8), Ay = and 
u 


2 
H H 
Var{ Zj = .H , by (3.1.1), 
ar{Z} 1425 (5) ence, by ( ) 


2 1 1 u EA u 
a OR Mites (45) | urar 01O 


Certainly, the case of the exponential distribution is simple, and we could readily compute 
the same directly, without using (3.1.1). 

EXAMPLE 4. In the Illustrative life table in Appendix, Section 3, using (3.1.4), we 
provide the values of a, based on the original data from Table 7.1.5.1-1. Let, for example, 
x = 70. Then A70 = 0.5729602, and 


1 
äm = Toes — 0.5729602) = 10.89093833. 


482 9. ANNUITY MODELS 


Thus, the value of the annuity of one dollar per year paid at the beginning of each year 
starting from the age 70, is $10.89. This is certainly smaller than the expected amount to 
be paid: e79 = 14.7 (see Table 7.1.5.1-1), so on the average, the annuitant will get $14.7. 
The reader can easily recalculate all of this for an annuity of, say, $50,000 per year (for, 
example, for such a pension annuity). 


Route 1 => page 482 


EXAMPLE 5. We revisit Example 8.3.2-1. As was promised there, consider the problem 
under the assumption that the company pays to the insured her/his current salary from the 
moment of the loss of the job until the moment when the insured finds a new job or dies. 
Denote by & the duration of the payment period, and assume € to be exponential with 
parameter b = 2, and to be independent of the time of failure. For simplicity, let us consider 
the case when the company pays the salary continuously. 

If the job is lost at a moment f, the lost annual salary is e°. The payment of the salary 
is a continuous annuity over the random period [0,6]. If the company had paid one unit of 
money per year, from the standpoint of the time ¢ the present value of this payment would 
have been s(1 = e 5) by virtue of (2.2.1). Since the annual salary is not one but e°, 


the present value mentioned is e™ 1 (1 — e~%). From the standpoint of the initial time 0, 
1 1 
the present value of the same payment is R; = gone —e~§) = es =e Pe), 


since 6 = 0.06. Consequently, 
1 
R, =e °°"Y, where Y = z0 ag os) 


The r.v. Y is the present value of the whole life continuous annuity with the lifetime €. The 
rest can be handled as in Example 8.3.2-1. The present value of the insurance is the r.v. 


z= 0 with probability 0.9, 
~ | e~9-Xy with probability 0.1, 


where X is uniformly distributed on [0, 10]. The net single premium 


A=E{Z} =0.1- Efe ?°“Y} =0.1. E{e 8 EY} 


because X and Y are independent. We have computed in Example 8.3.2-1 that E{e~°°°*} = 


0.906. Because & is exponential, (3.1.6) implies that 
1 1 
E{Y } = —. = — 7 0.485 
{vy b+ò 2.06 , 


where b is the parameter of €; see Example 8.3.2-1. Thus, A~0.1-0.906-0.485~0.0439. 


Certainly, the aggregate payment technique above is not the only way to compute the 
APV. We can use the general formulas (1.1.7) and (1.2.5) where, in the case of whole life 
insurance, we set c; = 1 (or cg = 1 in the discrete case.) This leads to 


dy = f e™ padt. (3.1.7) 
0 


3. Some Particular Types of Level Annuities 483 


VD (3.1.8) 
k=0 


We will use these formulas repeatedly. 


Next, we consider the recurrence formula 
Gy = l +vpräx+1. (3.1.9) 


We prove (3.1.9) below, although it is almost obvious heuristically. For an annuity-due, 
the first unit payment is made at the very beginning, and this corresponds to the term 1 in 
(3.1.9). If the annuitant survives the first year (the probability of this is py), then the annuity 
to be paid will be equivalent to the whole life annuity for a life-age-(x + 1). The APV of 
this annuity is ä+1. Since we evaluate it from the standpoint of the time t = 0, we should 
multiply d,+1 by the discount factor v. All of this is again relevant to the first step analysis 
considered in Section 4.4.3. 

A formal proof of (3.1.9) may run as follows. The term for k = 0 in (3.1.8) equals one 
because 9p, = 1. Making the variable change m = k — 1 and using (7.1.2.7), we can write 
that 


dy =14+ È vips = 1 +v $ v Pr ert 
k=1 k=1 


=1+vpx Dy VP ADs =1+vpx Ł v” mPx+1 = l +vpxäx+1. (3.1.10) 
k=1 m=0 
The recursion relation (3.1.9) may be applied in practical calculations. If we manage to 
evaluate d, for large x’s, say, for x = 100, then we can move backward and calculate d, for 
all other x’s. On the other hand, for very old people, the probability of dying within a short 
period is high, so we can assume that, say, dij99 is close to one. See also Example 8.2.1.2-2. 
We consider a counterpart of (3.1.9) for continuous payments in Exercise 13 and do 
particular calculations in Exercise 11. 


3.2 Temporary annuities 


Let T = T(x) be the lifetime of (x), and let K = K(x) = [T(x)], the curtate lifetime. In 
the discrete case, we deal with annuities-due, not stating it explicitly each time. 

An n-year temporary life annuity provides for regular payments either until the moment 
of death or by the moment when the annuitant attains the age x+n (whichever comes first). 
So, at the moment x +n and on, the insurance organization does not pay. More precisely, 


YW = min{n,T} in the continuous payment case, and 


Y = min{n,K +1} in the case of annuities-due. (3.2.1) 


Note again that in the discrete case, ¥ is the number of the intervals in which payments are 
provided. To clarify why we wrote K + 1 above, consider three cases. 


¢ If K >n (the annuitant has survived n years), then min{n, K + 1} = n; that is, there 
were exactly n payments starting from the initial zero time. 


484 9. ANNUITY MODELS 


¢ If the annuitant died within the last interval [n — 1,n], then K=n—1, and min{n,K+1} 
is still equal to n. So again, there will be n payments. The annuitant received the last 
nth payment at time t = n — 1, at the beginning of the year of death. 


e If K <n-—1, then min{n,K +1} = K +1, which is exactly equal to the number of 
payments in this case: starting at the time ¢ = 0, and until the moment t = K. 


The insurance with a single unit payment at the time ¥ defined in (3.2.1) is an n-year 
endowment insurance (see Section 8.2.4.2). So, we can again use (2.2.1) in the continuous 
case and (2.2.5) in the discrete case. 

For the APV, the logic of the traditional notation leads to the respective notations &x:7 
and äm. In accordance with (2.2.2), 


1 
aya = 5 (1 —Axa), (3.2.2) 
and due to (2.2.6), 
(1 — Axm), (3.2.3) 


where d = 1 — v = 1 — e7. 


EXAMPLE 1. Let x = 60, n = 5, v = 0.96, and 1, for x = 60,...,65 are the same as in 
Example 8.2.4.2-1. (The reader may remember that we purposely kept the same data in the 
example mentioned and in Examples 8.2.3.2-2 and 8.2.4.1-1.) In Example 8.2.4.2-1, we 
calculated that Ago.3 + 0.820, and Var{Z} ~ 0.0014. Then, by (3.2.3), 


í l 

ligg.3] © goa! — 0.82) = 4.50. 
Making use of (2.2.7) and what we computed in Example 8.2.4.2-1, we can immediately 
compute the variance: 


1 0.0014 
Var{Y} = PAGA] = 


10.042 70875. 


Another and direct way to compute APVs is to again use (1.1.7) and (1.2.5). First, 
consider the continuous payment case, and observe that an n-year temporal annuity may be 
viewed as a whole life annuity with varying payments c; = 1 for t < n and c; = 0, otherwise. 
(To cease paying means to start paying nothing.) The present value may be presented as in 
(1.1.5) with ¥ = T (x) and c; as we have defined. Then (1.1.7) will lead to 


n= | a” pede. (3.2.4) 
0 


In the discrete case, we should set cg = 1 for k =0,...,n—1, and c = O for k > n, which 
implies 
n—-1 


ayn = y vÉ kPx- (3.2.5) 
k=0 


3. Some Particular Types of Level Annuities 485 


The next relation allows us to compute G,.q in terms of d,. Namely, we can write that 
ay = äx:7 + v” nPxAx+n, (3.2.6) 


and hence 
dyin = Ay — V” nPxfixtn- (3.2.7) 


As are many similar relations above, (3.2.6) is quite understandable from a heuristic point 
of view. The whole life annuity consists of the annuity paid during the time interval [0,n], 
and the annuity after the annuitant attains the age x+n. The last event has the probability 
nPx, and the APV of the annuity after the time moment x +n should be discounted by v”. 

We give a formal proof at the end of this subsection. Note also that (3.1.9) follows from 
(3.2.6) if we set n = 1, since for annuities-due, where payments are made at the beginning 
of each period, än = 1 (why?). 

A counterpart for the case of continuous payment is considered in Exercise 23. 


EXAMPLE 2 ([152, N12]*). The probability that a newborn lives to be 25 is 70%. The 
probability that a newborn lives to be 35 is 50%. The following annuities-due have APV 
equal to 60,000: a life annuity of 7,500 on (25), a life annuity of 12,300 on (35), and a life 
annuity of 9,400 on (25) that makes at most ten payments. What is the interest rate? 


Taking 1000 as a unit of money, we have: 7.5d25 = 60, 12.3d35 = 60, 9.4diy5.79 = 
60, 25P0 = 5(25) = 0.7, and 35po = s(35) = 0.5. From this, we get 10p25 = 6I ~ 0.71, 


ä25 = 8, ä35 ~ 4.88, and Gy5.70| = 6.4. By (3.2.6), 


ä25 = ä25.70 + v"? oprsdss, 
from which we obtain that 
10 8— 6.4 
~ 0.71 -4.88 
Then v = 0.926, and 6 = —Inv = 0.076 = 7.6%. 


x~ 0.464. 


Now consider the following generalization of (3.2.6): 
äxm = äxm +V” npxäyynm n for allm=n,n+1,.... (3.2.8) 


The logic is the same: we break the period [0,m] into the periods [0,7] and [n,m]. The 
proof is given at the end of the current subsection. The relation (3.2.6) follows from (3.2.8) 
if we set m = œ. (Show that dy. = Gy.) 


EXAMPLE 3 ([152, N2]*). Given v = 0.95, jop25 = 0.87, din5.75, = 9.868, and di3z5.5) = 
4.392, calculate AL To 

We follow the following logic. If we calculate A25:70 then knowing v and 10p25, we will 
be able to calculate Asio In view of (3.2.3), in order to calculate A,5.;9, it suffices to 
know 45.79 The last characteristic may be found with the use of (3.2.8). 


Reprinted with permission of the Casualty Actuarial Society. 


3Reprinted with permission of the Casualty Actuarial Society. 


486 9. ANNUITY MODELS 


So, first we compute j9E25 = v!° jgp25 = (0.95)!90.87 ~ 0.5209. Then, by (3.2.8), 
ins.) = 4g5.75 — V 10P 2545.5) © 9-868 — 0.521 - 4.392 ~ 7.580. Then, by (3.2.3), Aos.79 = 
1 — däs: © 1 — 0.05 - 7.580 ~ 0.621. At last, by (8.2.4.7), A}. Jg = Aos.iq) — 10825 © 
0.621 — 0.521 = 0.100. 


> The formula (3.2.8) gives a recursion procedure for computing äm. As was already 
noted, äg = 1. Then, setting n = 1 in (3.2.8), we have äx:m = 1 +VPrä,, 1m7] - Let us 
replace the letter m by the letter k and x by x+n — k in the last formula. This leads to 


Gin KR = 1 TV Pxtn—kGy 4 n—(e—1) FET for all k = 1,2,...,n. (3.2.9) 
Set h(k) = G,,,_;.q- From (3.2.9) it follows that 
h(k) =1+vprn-kh(k—1) forall k =1,2,...,n. (3.2.10) 


(Compare with (8.2.3.9). We could derive (3.2.10) from (8.2.3.9) and (3.2.3), but such a 
derivation will not be any shorter than the direct proof.) 

Observe that h(n) = äxm, while h(1) = d,,,,_;4; = 1, so we can provide a backward 
recursion. An example is considered in Exercise 14. < 


We proceed to a formal proof of (3.2.8). Similar to what we did when proving (3.1.10), 
and using (7.1.2.7), we have 
n—-1 m—1 m—1 
Gym = V kPx = Ł vk kPx + Ł vk kPx = Gen + y vf kPx 
k=0 k=n k=n 


m—1 
2 k- 
= da +V" Px Ł v” k-nPx+n: 
k=n 


Under the variable change s = k — n, this implies that 


m—n-—l1 


BS N n S as i 
Axim = Gem t+V npx Ł V sPxtn = Gem + ay na: E 
s=0 


3.3 Deferred annuities 


In an m-year deferred whole life annuity on a life-age-x, the process of payments starts 
at the time moment x +m (that is, after m years after the time of policy issue), provided that 
the annuitant attains the age x+m. A typical example is a pension plan. 

It is probably most convenient to define it formally as the whole life annuity with pay- 
ments 


ct =0 ift <m, and c; = 1 fort >m in the case of continuous payments, 
ck=0 if k=0,...,m—1, and cg=1 if k=m,m+1,... in the case of discrete payments. 
(3.3.1) 


The APVs (or net single premiums) are denoted by ;,|@, and 4x, respectively. Proceed- 
ing from (3.3.1) and the general formulas (1.1.7) and (1.2.5), we readily write that 


m|Ax = f e pdt, (3.3.2) 


m|äx = Ł vk kpr- (3.3.3) 


k=m 


3. Some Particular Types of Level Annuities 487 


The present value itself may be presented in different ways. We prefer here the following. 
For a moment, let Y, denote the present value of the whole life annuity on (x). Then in both 
cases—continuous and discrete—for the present value Y of the m-year deferred whole life 
annuity on a life-age-x, 

Y=0 if T(x) <m, 


Y=v"Yeim if T >m. Ga 


The logic is the same as that which we have applied repeatedly. If T(x) < m, then the 
annuitant gets nothing. If T (x) > m, then the annuitant attains the age x+ m, and the annuity 
becomes a usual whole life annuity whose present value is Y,+,, from the standpoint of the 
time t =m. 

Another form of representing Y is discussed in Exercise 17. 


Since P(T (x) > m) = mpx, from (3.3.4) it follows that 
E{Y} =v” ek (ein hs ELY} = v” mpxE {Ym}. (3.3.5) 
From the first relation it follows that 


m|āx = v” Px Axim, (3.3.6) 


m|Ëx =y" mPxÖx+m (3.3.7) 


[compare with (8.2.2.3), (8.2.2.5)]. 
Note that we could also derive (3.3.6)-(3.3.7) directly from (3.3.2)-(3.3.3). Detailed ad- 
vice on how to do that is given in Exercise 18. 


EXAMPLE 1. Mr. Doubt, a participant in a pension plan, is exactly 60 years old, and 
he has been in service in his current job 10 years prior to the present time. The retirement 
benefit of the plan pays annually (at the beginning of each year) the kth part of the salary at 
the time of retirement, multiplied by the number of years in service at the time of retirement. 

Mr. D.’s salary increases by 100m% each year on his birthday which, for simplicity, is 
January 1. (That is, if the increase is, say, 3%, then m = 0.03.) Mr. D. thinks about two 
opportunities: to retire at age 61, or to wait until the age of 68, when (in accordance with 
some rules) he will have to retire. Find m and k for which the latter opportunity is more 
profitable on the average, and compute the APV as a function of m and k. The interest 
rate during the period under consideration is about 4%. For the population to which Mr. D. 
belongs, lso = 84723, l61 = 82545, leg = 75344, deo. = 14.12, and deg = 11.13. 

Denote by B the current salary. The APV concerning the former opportunity is 


Cı =v- poo: BIL +m)k-11- Gey, 
while for the alternative opportunity, the APV is 
C = v8- gp69-B(1 +m)%k- 18 - deg = Bv? - peo < 17761 (1 + m)Èk- 18 - digg. 
[We used (7.1.2.7).] Canceling out common factors, we see that Cy > C4 iff 


v. 7P61 -(14+m)!-18- digg > 11d}. (3.3.8) 


488 9. ANNUITY MODELS 


So, the decision should not depend on k nor on the size of the salary, and Mr. D. can proceed 
in his calculations at once from the age of 61. The probability 776) = (/6g/l61) + 0.9127, 
the discount v = e~-* = 0.9608, and (3.3.8) is equivalent to 


7 1/7 . 1/7 
ins 1 Eide n 1 11-14.12 ~ 1.0168. 
v \ 18d68 - 7P61 0.9608 \ 18-11.13-0.913 


Thus, the retirement at 68 is more profitable if m > 0.017, i.e., the annual increase is not 
smaller than 1.7%. 

Since gp6o = (les /l60) © 0.913, the APV in this case is Cp = Bv®- gp60(1-+m)®k- 18- dig © 
(0.9608)® -0.913 -18 - 11.3(1 +m)8kB = 134.860(1 +m)8kB. 

For example, if m = 0.02 and k = 0.025, then C2 ~ 3.95B; that is, the present value 
amounts approximately to four annual salaries. (Certainly, Mr. D. will get more than this 
(why?). We should also take into account that he has only 10 years of service in this job.) 

A natural question is why we did not include in our calculations the salary that Mr. D. 
would receive if he decides to retire at the age of 68. We could do that. But then we would 
have to take into consideration the additional income which Mr. D. could have, being retired 
since the age of 61 (say, finding another job). Moreover, even if Mr. D. does not have an 
additional income after the earlier retirement, he would enjoy himself being free and, for 
example, traveling. This also should be taken into account. So, we compared only the 
APVs of retirement benefits, leaving Mr. D. to think about other issues. 


> Consider variances. From both relations in (3.3.5), we get that 
Var{Y } = E{y*} rat (E{Y}) = WO DE LY, m} = (V" mPxE{Yx4m})? 
=" mPx [E{¥o.m} = ip Gm) | . 
Replacing E{Y2,,,,} by Var{Ye4m} + (E{Yr+m})?, we eventually get that 
Var{¥ } =v" mpx [Var{Ye+m} + (1 — mPx)(E(Ye4+m})"] 
Sya mPx [Var{Y, +m} gg mqx(E{Yx4m})] ) 


where, as usual, mgd, = 1 — mpx- 
Thus, in view of (3.1.2) and (3.1.5), for the continuous case, 


Met. = 
Var {Y } =" mPx [a Aen z (Ax+m)*) T masin? ) (3.3.9) 
and for the discrete case, 
1 
Var{Y} = v™ mpx (Chn — (Ax4m) ) + masss? . (3.3.10) 


If we wish to represent it in annuity terms only, we may write, by virtue of (3.1.1), that 
Ay+m = 1—64y4m. For the characteristics for the double rate, we have TAs =1-26.- 
EEA where 7a, stands for the APV of the whole life annuity with the rate 28. Substituting 
it into (3.3.9), after some algebra one can get that 


2 
Var{Y } = ven mPx É (Geto a Pai) E nPs(drn) : (3.3.11) 


3. Some Particular Types of Level Annuities 489 


We certainly can do the same for the discrete case, but we should be cautious when 
considering the double rate. Of course, by virtue of (3.1.4), we can write, that Ay+ = 
1 — däx+m, but is it correct to write PA tp =1—2d Ga? Certainly not, since we double 
ô not d = 1 — v. When doubling 5, we square v = e>. Then d = 1 — v should be replaced 
by 1— v? =(1—v)(1+v) =d(2—d) = 2d — d?. Eventually, 7Ay4_ = 1 — (2d — d?) - tix m. 
Substituting it into (3.3.10) and doing some algebra, we obtain that 


Var{Y } =y mPx 3 (äx4m = ites —mDx(derm)* +max+m (3.3.12) 
[compare with (3.3.11)]. < 


3.4 Certain and life annuities 


An n-year certain and life annuity on (x) guarantees payments during the first n years, 
and if the annuitant attains the age x+n, the annuity proceeds as a whole life annuity. (So, 
if the annuitant dies within the first n years, the payment will be made to the beneficiary 
until the nth year.) In other words, the r.v. 


WY =max{n,T} in the continuous payment case, 


WY = max{n, K + 1} in the case of annuities-due. (3.4.1) 


[Compare with (3.2.1).] 

Denote by „Yx the present value of the n-year deferred annuity for the same person, keep- 
ing this notation for both cases: continuous and discrete. Also recall the notations Gq and 
Gm for certain annuities during n periods in the continuous and discrete cases, respectively; 
see representations in (2.1.2) and (2.1.4). Then, by definition, for the present value Y of the 
annuity under discussion, we can write 


m+ n|¥x in the case of continuous payments, 
m+ n|Yx in the case of discrete payments. 


(3.4.2) 


The traditional notation for the APV in this case (the reader should prepare her/himself 
for what will happen now) is aj; and dj), respectively. The appearance of the “big bar” 
is logical. We already indicated the maximum of two variables by this symbol; see the 
beginning of Section 7.3. The bar here indicates that we consider the maximum in (3.4.1). 

It remains to combine (3.4.2), (2.1.2), (2.1.4), (3.3.6), and (3.3.7), which immediately 


implies that 


t= en 
am = E{Y} = ām n|ax = 5 Hv” AD pies (3.4.3) 
T 2 K 1-v" - E 
ami = E{Y} = an n\Ax = =p FV nPxäx+n- (3.4.4) 


In view of the same representation (3.4.2), 


Var{Y } = Var{n\¥x}, 


490 9. ANNUITY MODELS 


that is, Var{Y} coincides with the expressions (3.3.11) and (3.3.12) for the variances of 
deferred annuities. 

We will come to another useful representation if for ,)@, and „jäy we use (3.3.2) and 
(3.3.3), respectively. It will lead to formulas 


H 1 — eon oo = 

axm 5 | e“ pdt, (3.4.5) 

: l-v 

ism = Tay + Viv ee (3.4.6) 
=n 


At least in the latter case, it makes sense to recall that the first term is 1 +v +v? +... + 


v’—|. so we can rewrite (3.4.6) in the following nice form: 


n—l1 oo 
äm itvt+..4 Tlv". ape HVT. ayip t. = L + yu. kPx- (3.4.7) 


EXAMPLE 1. Given äx = 10.1, v = 0.96, py = 0.99, and px+1 = 0.98, find üz 
While d= 14 tv? + sjäx, the whole life APV äy = 1+vp,+v* 2px + 3jäx. Then the 
difference d= aa ax = nee —px)+v?(1— 2px). Also recall that 2p, = px px+1 [see (7.1.2.8)]. 


Hence, d= — dx = 0.96 - 0.01 + (0.96)? (1 — 0.99 - 0.98) = 0.037, and dz = ax + 0.037 = 
10.137. 

EXAMPLE 2 ([159, N12]*). For a special annuity product on a life aged 50 with a single 
benefit premium of $50,000 paid immediately, you are given the following information: an 
annual benefit of K at the beginning of each year is guaranteed for the first five years; after 
five years, an annual benefit of K at the beginning of each year will be given until death; 
i = 0.06; asọ = 12.267; Asos = = 0.029; As5 = 0.305. Calculate the annual benefit K. 

This exam problem is on the knowledge of a number of formulas. 


The APV of the product with a unit benefit premium (rather than K) is 


4 
a= Pov. 5P50° ä55. (3.4.8) 
k=0 


We should find all entities in (3.4.8). First, v = a = i zx 0.943. 


Ti 

Secondly, äiss = PAS m~ 10305 we 12.279. 

Now, note that aso is an annuity immediate. We need to know dsp = 1 + aso = 13.267. 
Then Ps et 0s): 13.267 = 0.249. 

Furthermore, Aso = Anns 5 + 5p50 vřAss. All entities in this relation are known except 
5Pso. After simple calculations, we get 5 ps0 ~ 0.966. 

Now we know everything to compute a in (3.4.8). Substitution leads to a ~ 13.323. 


The real annual benefit is K, so K -a = 50,000. This implies K ~ a x 3,752. 


Route 1 = page 505 
Reprinted with pen 


eprinted with permission of the Casualty Actuarial Society. 


4. More on Varying Payments 491 


4 MORE ON VARYING PAYMENTS 


In the general case of payments varying in time, we are doomed to direct calculations. 
For example, for computing APVs, we should appeal to the general formulas (1.1.7) and/or 
(1.2.5). Powerful computers and good software make such calculations tractable. 

In special cases, we can, nevertheless, write nice representations. 

First, consider a standard increasing life annuity on (x) that is payable continuously. 
Without loss of generality, we can assume the rate of the growth of payment to be the unit 
rate, and set c; = t. Then the present value 


F T (a —8T 1 
y= f ce dt = je gs ee ee (4.1) 
0 0 8 5 


The term Te~* is the present value of a continuously increasing whole life insurance (see 
Section 8.3.1). The notation for E{Y } is (/@),, and from (4.1) it follows that 


E 1—4; 1, 
(Ia) = z 5 (IA)x. (4.2) 
Recalling that 1 —A, = 8a,, we rewrite it as 
_ a, —(IA 
(ia), = ZUA (4.3) 


Let us consider the discrete case in the same manner, setting cg = k+ 1. The present 
value 


We use the general formula 


1 m tage (m+ hg 
k+1 ee k m+1 m+1 = . 
ieee er (Es (m+1)q ner P 


[It may be derived in many ways—in particular, as follows: 


m m+2 
E (k+ 14 ab! MH (cE) (4.5) 


k=0 I—q 


(4.4) 


One can readily verify that the last derivative equals the right member of (4.4).] 


Thus, 
1—vAtl (K+1)vkt! 


Y= 4.6 
(1—v)? l-v 0) 
From (4.6) it follows that the APV 
E 1— 1 
(Iä)x =E{Y} = Wee gA» (4.7) 


p 


492 9. ANNUITY MODELS 


where d = 1 — v. Since 1 — A, = dd, we have 


diy — (IA) x 
a a 
The last formulas are nice and may be of help, but it makes sense also to note that for 


such a simple type of varying payment, the direct calculation of the APV—especially if we 
have good software—may turn out to be just as expeditious. 


(Iä); = E{Y } = (4.8) 


EXAMPLE 1. (a) Consider an annually decreasing annuity-due on (50), assuming the 
lifetime X to be uniform on [0, 100]. Let v = 0.96. Since T (x) is uniform, P(K = k) = 1/50 
for k =0,...,49. Then, 


K+1 S mu 1 ae le 
Aso = E{v ae 50. 50” c= zx 0.418, 
49 1 1 49 
(IA)s0 = EKE Si (k+ Dv = g A H 
k=0 k=0 
1 j=? 507° 
= so” (aaa rs) he 


where in the step before the last, we used (4.4). Hence, by (4.7), 


1—0.418 7.324 
Ta)59 = zx 180.863. 4. 
Kanc ap oa G 


(b) Let us compute the same directly. Since T is uniformly distributed, p, = (50 — k) /50 
for k < 50. Then, by (1.2.5), 


(Idi) = vk ep -Fard (1-4) ye 1)* l Fa + 1)kv*. (4.10) 
X kY kPx 50 | 50 T . . 


The first sum may be computed with the use of (4.4). For the second, we need a formula 
which certainly exists and may be derived by computing the second derivative in (4.5). We 
skip lengthy calculations here. Note also that the second sum in (4.10) is tractable even for 
a good calculator. The reader can verify that the answer is the same as in (4.9). 


In many problems, practical or just for study purposes, 
varying payments may be represented as a combination of level annuities. 


EXAMPLE 2 ([151, N1]°). A special 30-year annuity-due on a person of age 30 pays 10 
for the first 10 years, 20 for the next 10 years and 30 for the last 10 years. Given 29£30 = m, 
439.79] = YU, 439.39 = V> and disy.79 = W- find the APV. 


Reprinted with permission of the Casualty Actuarial Society. 


5. Annuities with m-thly Payments 493 


Consider the annuity paying 20 during the whole 30-year period. Its APV is 20é39.39 = 
20v. If we subtract from this 10é39.79, = 10u, we will get the APV of the annuity paying 
10 the first 10 years and 20 during the remaining time. (We can imagine that the annuitant 
pays 10 units back each year during the first 10 years.) The new annuity has an APV of 
20v—10u. Next, we add payments of 10 during the last 10 years. From the standpoint of the 
initial moment, the APV of this additional annuity is 10y% 20P30450.79 = 1020F 3045.79 = 
10mw. So, the answer is (— 10u + 20v + 10mw). 


In the next example, we consider temporary increasing annuities which are defined as 
increasing annuities acting in a given period. 


EXAMPLE 3 ([150, N9]°). A person aged 20 buys a special five-year temporary life 
annuity-due with payments 1, 3, 5, 7, 9. Given: äg = 3.41, azg = 3.04, (Id) 9.4 = 
8.05, (1a) 59.4 = 7-17 (where the symbols agọ. and (Ia)59.q correspond to the annuities- 
immediate, that is, to the annuities payable at the end of each time interval). Calculate the 
net single premium. 

The point here is that, while the question concerns a 5-year annuity, the data correspond 
to a 4-year period. The payments 1, 3, 5, 7, 9 may be represented as 1, 1+2-1,1+2-2,1+ 
2-3,1+2-4. Then the total payment is equivalent to one unit paid at the beginning + the 
level annuity-immediate with unit rate + the increasing annuity-immediate with a starting 
payment of 2. So, the APV= 1 + dy9.q+2(1a)o9.q = 1+3.0442-7.17 = 18.38. 


5 ANNUITIES WITH m-thly PAYMENTS 


Here we consider the case when payments are made m times a year. It is especially 
important when one deals with pension plans. 

Let i be a given effective annual interest rate, and d = 1 — v = i the corresponding 
annual discount rate; for definitions see Section 0.8.3. We revisit the scheme of Section 
8.2.1.4, keeping the same notation and terms. In particular, we call m-ths the periods of 
length 1/m from this scheme. (Say, if m = 12, an m-th is a month.) Consider a whole life 
annuity with payments of 1 /m made m times a year at moments 0,1/m,...,(m—1)/m. So, 
for each complete year, the annual payment is one. Denote the present value of this annuity 
by Y(”), and the APV by ai”), 

Let us choose for a while an m-th as a unit of time, and denote by v™”) the discount factor 
corresponding to this new unit of time. In view of the general formula v, = v’, the new 
discount y(”) = y!/ m where v is the annual discount. Set dm) = 1— yl) the counterpart 
of d = 1 — v in this case. We will soon see why it makes sense to use a tilde here. 

Consider, first, the annuity which pays not 1/m but one unit of money at the beginning 
of each m-th, and denote the APV for this annuity by y(n), By virtue of the general repre- 
sentation (2.2.5), 


Reprinted with permission of the Casualty Actuarial Society. 


494 9. ANNUITY MODELS 


po I) (1 -z") (5.1) 
dim) 
where Z0”) is the present value of the whole life insurance providing for payment of a unit 
of money at the end of the m-th of death. This is exactly the insurance we considered in 
Section 8.2.1.4. In particular, E{Z"”)} = An 
It is important that, although (5.1) corresponds to the new unit of time, the dimension of 
the left and the right members of (5.1) is money, and we keep the unit of money as it was. 
Now, we come back to the original annuity with payments i. Clearly, Y™ = 170), 
and hence 


ra= klize). o 


where 
d™ =m(1—v!/"), 


The quantity d™) is a nominal annual rate of discount for the case when interest is 
compounded mthly. We introduced and discussed it in Section 0.8.5. Briefly, this is an 
annual discount rate which leads to the effective annual rate of discount d, if interest is 
compounded m times a year. 

In particular, as was shown in Section 0.8.5, 


d™) + § as m > o. (5.3) 


Now we return to (5.2) which implies that 


a = aa (1-4). (5.4) 

Note that we could come to (5.4) in another way by reasoning—slightly heuristically—as 
follows. The general representation (2.2.5) implies that (5.4) must be true for an appropriate 
discount characteristic (a new d) in the denominator. Since one unit of money is paid 
during a year, this characteristic should be annual. Since payments are provided mthly, we 
should consider the case when interest is compounded m times a year. Then the discount 
characteristic should be the annual rate leading to the annual discount rate d. In Section 
0.8.5, we have shown that d°”) is exactly the characteristic with this property. 

Nevertheless, the detailed derivation above makes the picture more transparent. 

Now, we can either leave (5.4) as itis, or apply the approximation (8.2.1.11) from Section 
8.2.1.4. Assuming that the lifetime is uniformly distributed within each year, we write that 
AL”) = (i/i")A,, where i™ = m[(1+i)!/" — 1], the nominal annual interest rate. (See 
Sections 8.2.1.4 and 0.8.2.) 

On the other hand, A, = 1 — dä. Substituting it into (5.4), after simple algebra we get 
that 

al” = u(m)diz — B(m), (5.5) 


where 


(5.6) 


6. Multiple Decrement and Multiple Life Models 


495 


The coefficients «(m) and B(m) depend only on m and the interest rate i. They are a bit 
cumbersome but calculable. We omit simple but tedious calculations showing that 


which is used in practice. 
In the table below, we present the values of &(m) and B(m) for m = 4,6, 12, and œ, and 
for i = 0.03, 0.04, 0.05, and 0.06. In particular, it shows the degree of accuracy of the 


approximation (5.7). 


(5.7) 


approx. 
m} i> 0.03 0.04 0.05 0.06 (5.7) 

a(m) 

4 1.0000683 | 1.0001202 | 1.0001850 | 1.0002653 1 

6 1.0000708 | 1.0001246 | 1.0001929 | 1.0002751 1 

12 1.0000723 | 1.0002731 | 1.0001970 | 1.0002810 1 

oo 1.0000728 | 1.0001282 | 1.0001984 | 1.0002820 1 
B(m) 

4 0.3796529 | 0.3811888 | 0.3827173 | 0.3842386 | 0.375 

6 0.4214919 | 0.4230847 | 0.4246698 | 0.4262475 | 0.416... 

12 0.4632610 | 0.4648889 | 0.4665080 | 0.4681195 | 0.4583... 

oo 0.5049631 | 0.5066014 | 0.5082319 | 0.5098546 | 0.5 


Once we know a connection between di 


(m) 


and d,, we can write the relation between the 


respective temporary annuities. Proceeding from (3.2.7), which is clearly correct for any 
unit of time, we have 


ai”) — ai”) — y" pais”, = om) di — B(m) — v" npx(O(m) dx 4n — B(m)) 
=o( 


m)(äx—V" nPxdx+n)—B(m)(1—v" apx) = (im) dym—B(m)(1—-v" npx). (5.8) 


In the last step, we used (3.2.7) again. An example is considered in Exercise 39. 


6 MULTIPLE DECREMENT AND MULTIPLE LIFE MODELS 
6.1 Multiple decrement 


If the present value of an annuity is specified only by the value of the failure time T, then 
we can find the APV and other characteristics directly, considering the standard annuity 
scheme for the lifetime T. If there is a factor that may cause a change in the payment rate 


496 9. ANNUITY MODELS 


but not a complete cessation, the situation is more sophisticated. We restrict ourselves to 
several examples. 


EXAMPLE 1. ([151, N12]’). Harold has been disabled and will begin receiving disabil- 
ity premiums. You are given: the discount factor v = 0.95; the hazard function for recovery 
is How” = 0.1(3 — t); the hazard function for death is one = 0.1t; payments of $10,000 
begin today, his 62™ birthday, and he will receive $10,000 as long as he has not recovered 
or died; there will be no payments made beyond his 65" birthday. Calculate the APV of 
Harold’s disability payments. 


The problem is easy since in accordance with (7.2.1.8), the total force of decrement is 


recover 


ten (t) = ue? + ug = 0.113 —t) +0.1t = 0.3, 


that is, a constant. In this case, it is easier to compute the APV directly by (3.2.5). Let 
p = P(T (62) > 1) = e™ =e-°3 ~ 0.741. Then ps2 = e #* = p*. Since there will be at 
most four payments, we deal with 


ign. = 10000(1 + vp +v’ p’ +v° p°) ~ 25483.35. 


EXAMPLE 2. Let all data be the same as above, excepting Hesp, > = 0.2(3 —t). Then 


recove: 


uot) = uen > FHeste = 0.2(3 —t) +0.1t = 0.6 —0.1f, 


and, in accordance with (7.2.1.9), 


t 
P62 = exp icf (0.6 — 0.15)4s} = exp { —0.6t + 0.0517} 
0 


fort < 3. Then 
3 3 
digg. = 10000 Y v* p62 = 10,000 Ý (0.95)* exp { —0.6k + 0.05k?} ~ 21023.7963, 
k=0 k=0 


as one may calculate using even a calculator. 


EXAMPLE 3. Let us return to Example 1 and assume, for variety, that payments are 
provided continuously. Suppose that, if Harold is alive but is not recovered, the company 
pays at an annual rate of c = 10,000. If Harold recovers, the company continues to pay up 
to Harold’s death at a smaller rate of c = 5000. Find the APV under the assumption that 
the causes of decrement mentioned are acting independently. 

The hazard rates for recovery and for death are given by the functions 


u(t) = 0.113 —t) ift <3, and =0, otherwise, and 


u(t) = 0.11, 
respectively. 
Reprinted with permission of the Casualty Actuarial Society. 


6. Multiple Decrement and Multiple Life Models 497 


We can apply the scheme of Section 7.2.2 from which we know that if the causes are 
independent, then the hazard rate p) (t) coincides with the marginal hazard rate corre- 
sponding to the cause j; see Section 7.2.2 for detail. For us, this means that the hazard rate 
for the pure whole life annuity acting independently equals u” (t). 

Let us now observe that the annuity under consideration may be represented as the sum 
of the following two annuities: 


e the whole life annuity with the payment rate c and, as we know, with the hazard rate 
(2) (4). 
w(t); 


¢ the multiple decrement whole life annuity with the payment rate c—c and the hazard 
rates u(® (t) and u® (t). 


Indeed, denote by £, the event that Harold is alive at time t, and let C; denote the event 
that Harold is still disabled at time t. Consider an infinitesimally small interval [t,t + dt]. 
The company pays cdt if Harold is alive but not recovered, and pays cdt if Harold is alive 
and healthy. We can represent it as 


[Ciz + (c—¢)lgq]dt 


where, as usual, 1¢ is the indicator of an event £. 

The last relation corresponds to the sum of the two annuities mentioned. 

Now note that for the first annuity, ,py = exp {— fo ye?) (s)ds}, and the similar formula is 
true for the second annuity. Then, in accordance with (1.1.7), the APV of the first annuity 
is 


co t co co 
a'=¢ f exp {- f udas barze f e © exp{—0.0517}dt=c f exp{—6r—0.0507 hdr. 
0 0 0 0 


Up to a constant multiplier, the integrand is a normal density, so completing the square, we 
can calculate the integral directly. Another way is to use software. In any case, recalling 
that 6 = — 1n(0.95), one can easily get that the integral is approximately equal to 3.50, and 
hence 

a’ = 3.502. 


The second annuity is the whole life annuity with the hazard rate u! (s) +u? (s). First, we 
calculate 


Ea és 0.3t if ¢ <3, 
= t 
7 (u (s) FH (s))ds = 0.9 + f 0.1sds = 0.45 +0.05¢° if t >3. 
3 


Then the APV for the second annuity is equal to 


r=ea( fap) 


era ( fe & exp{—0.31}dt + |e ò exp{—0.45 —0. 0577}dr) 


498 9. ANNUITY MODELS 


The last integrals are also analytically calculable (especially the first!). So, either directly 
or using software, we will come to ~ 1.85 for the first integral, and ~ 0.82 for the second. 
Eventually, 

a” ~ (c —C)2.67, 


and the total APV 


a=da' +a" = 5000(3.50 +2.67) = 30850. 


6.2 Multiple life annuities 


First, we establish the counterpart of (8.4.2.2) for annuities. Consider an annuity whose 
present value is completely specified by a function 0(7), where T is the lifetime of a par- 
ticular status (single or multiple life) under consideration. Consider a status of two lives, 
(x) and (y). Then, absolutely similarly to what we did in Section 4.2, one may derive that 


Oxy + Oxy = Ay + Ay. (6.1) 


Here the a’s denote the APVs of the annuity specified by the same function 6(-) for the 
statuses x : y, x: y, and separate lives (x), (y), respectively. 

The same relation is certainly true for the discrete time case. Since the function (t) is 
arbitrary, it concerns all various types of annuities we considered earlier. 


EXAMPLE 1. (a) Consider two persons of ages x and y, whose future lifetimes 7; and 
T are independent and exponentially distributed with parameters u; = 0.04 and py = 0.05, 
respectively. The persons buy the joint annuity paying continuously at unit rate until the 
first death. 

As we know (e.g., see Example 7.3.2-6), the rv. T = min{7,,7} is exponential with 
parameter u = u + uo. Then in accordance with Example 1.1-1a, 

1 1 1 
a= mto ae Ea Oe i FO (02) 

(b) Consider the annuity paying at the same unit rate until the second death, so the life- 
time of the status is T = max{T; , T2}. We can compute a5 directly, but it is better to apply 
(6.1), which immediately leads to 


1 1 1 
ag = + ; 6.3 
+8 oth wy typo +8 (6:3) 


Now consider the whole life annuity paying at a rate cı until the first death, and at a rate 
c2 after the first death and until the second. This may be considered as the sum of two 
annuities: paying at the rate c2 until the second death, and paying at the rate cı — c2 until 
the first death. By virtue of (6.1), the APV in this case is 


a=(c1—c2)äxy+C2äzy=(c1—C2)äxy+c2(äx+åãy—äãx:y)=(c1—2c2)äx:y+c2(ã&x+āy). (6.4) 


6. Multiple Decrement and Multiple Life Models 499 


The same formula is true for the discrete case. 

It is noteworthy that in the case c1 — c2 < 0, (6.4) keeps working: if c1 < c2, we may view 
this as if the insurer pays cz until the second death but simultaneously subtract |c; — c2| until 
the first death. 


EXAMPLE 2. In the case of Example 1, formula (6.4) gives 


a c2 ë O O 
m+po+8 m+ m+s 


EXAMPLE 3 ([152, N5]). A pair of twins age (30) purchases a fully continuous joint 
life contract involving an annuity along with life insurance. Namely, the contract pays: the 
annuity of 1000 per year while both are alive; 1000 at the moment of the first death; the 
annuity of 600 per year after the first death until the second death; 800 at the moment of 
the second death. The future lifetimes of the twins are i.i.d.; 6 = 0.05; ux(t) = 0.04 for all 
x and t. Find the APV. 

Let the unit of money be $100. The pure annuity part corresponds to the case of Example 
2 with uw) = u2 = 0.04, c1 = 10, c2 = 6, so the APV of this part is 

10—2-6 6 


= +2- x~ 117.9487. 
0.08 +0.05 0.04 + 0.05 ae 


a 


Now, let us consider the life insurance part. It may be viewed as the sum of two insurances: 
for the joint and for the last-survivor statuses. 

If T; and T, are the separate lifetimes, the r.v. T = min{ T; , T> } is exponentially distributed 
with parameter ŭ = 2ux(t) = 0.08 (see, e.g., Example 7.3.2-6). 


In the continuous-time exponential case, A, = pre see (8.1.1.8). 
Keeping this in mind, we have A30:30 = [u/(“+)] = ae ~ 0.6154 and A30 = 


mte ~ 0.4444. By (8.4.2.2), Azgag © 2: 0.4444 — 0.6154 = 0.2734. 

Consequently, the total APV for the insurance part equals A = 10A30:30 + 8A3g35 ~ 10- 
0.6154 +8 - 0.2734 = 8.3412. 

Eventually, the APV of the whole product ~ 100 - (117.948 + 8.3412) = 12,628.92. 


EXAMPLE 4. The lifetimes T; and 7> of a husband and wife are independent and uni- 
formly distributed on [0,50]; the discount v = 0.96. A special annuity pays continuously 
at a rate of c until the first death. If it is the husband’s death, the annuity continues at the 
same rate. If the wife dies first, the rate changes to a rate of c. Find the APV. 

This example is close to Example 8.4.2-4, and as we did there, we will consider 50 
years as a unit of time. The annuity under consideration may be viewed as the sum of two 
annuities: 


e the whole life annuity for the wife with the rate c, and 
e the annuity payable in the period |[T2,T,] at rate c, provided that Ti > 7). 


Reprinted with permission of the Casualty Actuarial Society. 


500 9. ANNUITY MODELS 


By (3.1.3) from Example 3.1-1, the present value per unit payment rate for the first 
annuity is 
—3 
, e §-14+6 
a’ = ee? (6.5) 
where 6 is the rate corresponding to the 50-years period. As we computed in Example 
8.4.2-4, 5 = 2.0411, and inserting it into (6.5), we get a’ ~ 0.2811. 
Consider the second annuity, for a while assuming the payment rate to be unit rate. The 
present value of this annuity is the r.v. 


0 if T; < Th, 


= Ti 1 : 
n f e “du = m — g eh) if Ti > D. (6,8) 
T; 


To compute the expected value of this r.v., we should take the integral of the function 


1 
5 (e~*—e~*) multiplied by the joint density of the r.v.’s T, , Ty over the region {(t,s) :t>s}. 


The joint density of two independent (!) r.v.’s uniformly distributed on [0,1] is the func- 
tion f(t,s) =1-1=1. So, 


24+8)+3-2 


1 l , e > 
<M SS Ż (5-98 __ ,— ôt = 
a =E{Y\} =} Í 5(e e ”)dsdt 3 (6.7) 


which can be easily verified by double integration. Inserting the value of 6, we get a” ~ 
0.0513. Eventually, the APV for the total annuity is @ = ca’ +ca"” ~ 0.2811c+0.0513¢. 


7 EXERCISES 


The use of software for computing integrals or sums in problems below is recommended. 


Sections 1-3 
1. A special 3-year temporary annuity-due is specified by the following table. 


k 0 1 2 
payment 10 20 15 
qx+k 0.02 0.03 0.04 


Find the mean and variance of the present value in the case of v = 0.96. (Hint: It may happen 
that you do not need all the data given.) 


2. Consider the annuity-due payable to (30) during 10 years. The payment in the kth year is 
equal to vk+1 . Survival probabilities are evaluated in accordance with Table 7-1.5.1-1. 
Using spreadsheet software, compute (a) the APV applying the current payment technique; 
(b) the APV and variance applying the aggregate payment technique. 


3. What is the difference between G9 and ax) ? 


4. 


10. 
11. 


12. 


13. 


14. 


15. 
16. 


7. Exercises 501 


Explain from a heuristic point of view that typically a.) < ā and äy+1 < d,. Is it always 
true? Give a heuristic argument and a mathematical explanation. Proceeding from (2.2.2) 
and (2.2.6), show that your reasoning is consistent with what we did in Exercise 8.1b. 


. The problem concerns level annuities with unit rate. 


(a) Show that if 5 > 0, then the r.v. Y is a bounded r.v. What is its largest value in the 
continuous and discrete time cases? Explain why the answer should be expected. 


(b) What is Y equal to if 5 = 0? Give also an heuristic explanation. Is Y bounded in this 
case? (Certainly, in the real life all r.v.’s are bounded, so we are talking about the Y 
from our model.) What is Y equal to if 6 = 0 in the case of the whole life annuity with 
continuous payment? Illustrate your conclusion considering the case (3.1.6) including 
the expression for the variance. 


. Proceeding from relation (2.2.1), solve the following problems. (a) How is Y distributed if Z 


is uniform on [0,1]? (b) Given an interest rate 5 and a constant force of mortality u, find the 
distribution of Y for the whole life continuous annuity. Consider the case 5 = u separately. 
(Advice: Look up Exercise 8.11.) 


. Given an interest rate 6, find the distribution of Y for the whole life continuous annuity if 


T (x) is uniform on [0,c]. Analyze the case 6 > 0. 


(a) Certainly, a, cannot be negative. Show that the numerator in (3.1.3) is indeed non- 
negative. (Advice: Consider the numerator at s = 0, and take its derivative in s.) 

(b) Consider the limit of (3.1.3) as 6 — 0 (for example, applying L’ H6pital’s rule). Explain 
the answer from an economic and probabilistic point of view. 


. Is it correct that the larger the value of an annuity to be paid, the less the value of the corre- 


sponding insurance plan? 
; -1 ; 
Is it true that &x:m = 3a —A,.m) ? If not, what was not taken into account? 


(a) Assuming that G99 = 1 and using (3.1.9) and Table 7-1.5.1-1, provide a worksheet for 
computing d, for x = 0, ...,99 and 6 = 0.04. 


(b) Proceeding from the assumption in Example 8.2.1.2-2, calculate dog. After that, pro- 
vide a worksheet for computing d, for smaller integer x’s. 


Proceeding from (2.2.4), write a formula for Var {Y } in terms of the APVs of annuities, using 
the symbol 7a for annuities corresponding to the double rate. 


Consider the recursion formula 
ay = G,.7 + VPxGy41- (7.1) 


(a) Show that it is almost obvious from a heuristic point of view. Compare it with (3.1.9). 
(b) Prove it rigorously. (Advice: Split the integral in (3.1.7) into two parts, and make the 
variable change s = t — | in the second integral.) 
Provide a worksheet for the recursion procedure (3.2.10). Using data from Table 7-1.5.1-1, 
compute, for instance, 30:75} 
In each pair, which is large: (i) &y or Gx; (ii) dy or äx:m, (ii) äx or Ax ? 


Give examples of when a, >A, , and when a, < Ay. (Advice: Consider the cases when T(x) 
“is large and when it is small.” A good particular example may concern the exponential case.) 


502 


17. 


18. 


19. 


20. 


21. 


22. 
23. 
24. 


25. 
26. 
27. 


28. 


9. ANNUITY MODELS 


Show that for an m-deferred whole life annuity, in the continuous payment case, the present 


value 
Y=0 if T(x) <m 
Y = v" ām if T >m 


[the definition of aq is given in (2.1.2)]; and in the case of discrete payments, 


Y=0 if K =0,....m—1, 
Y = v"ägym if K =m,m+1,... 


[for äm, see (2.1.4)]. 


Derive (3.3.6)-(3.3.7) directly from (3.3.2)-(3.3.3). (Advice: Replace ;px by mPx+t—mPt+m 
and e~* by e~o"e—S(t—™) ) 
Show that 


n—1 


Gyn = y kEx. 
k=0 


From which formula does it immediately follow? By what technique did we get this formula? 
(a) Which of the quantities &,, &y:m, ml\4x, Gx, Gem, mļäx are non-increasing in 6 ? 


(b) Write the limits of the quantities above for 5 + 0 and 5 — œ without calculations, 
proceeding from common sense. 


(c) Considering, as an example, for instance, m|ā, show how your answers follow from the 
corresponding mathematical representations. 


Which is larger: diz Or Gy ? When üz = ax ? When äi is close to dy ? Find the limit of 
Gig as 5 —> 0. 


Write formula (3.2.6) for the case 6 = 0 and prove it directly. 
Write the counterparts of (3.2.6) and (3.2.8) for the case of continuous payment. 


Check whether you can write without calculations, not memorizing but proceeding from 
common sense, the relations between (i) &y and Gy.q; (ii) äx and dy.q; (iii) äy and m|ax; (iv) Ay 
and G,; (v) A, and G,; (vi) Ax: and äm; (vii) Ala and di,.q. Write other relations of the above 
types that you consider valuable. (Advice: Certainly, you may include other characteristics 
in relations. A relation, say, between d, and m|@x, as a matter of fact, should be a relation 
between mjäx and dy, with some y different from x.) 


Given a, = 10.1, 4.429 = 8.5, az = 6.2, and 6 = 0.03, find Avan and 20px. 
Given d, = 10.1, aF = 10.1145, and p, = 0.985, find v. 


Ann is 20 years old now. Her kind and wealthy uncle wants to make a single deposit into a 
special account for Ann to receive the annuity-due of $20,000 per year during all her life. For 
some reasons, Ann prefers to start receiving money later but in larger amounts, for example, 
starting from the age of thirty. Setting 5 = 0.04 and using the Illustrative Table, calculate 
how much Ann should receive per year in the latter case under the condition that the value of 
the uncle’s gift will not change. 


Mr. A usually spends a part of his vacation in a resort “Not too bad” and comes back the 
next year with probability 0.9. The amounts of money spent each year are independent r.v.’s 
uniformly distributed on [6, 10] (in some units of money, say $100). The annual discount is 
0.96. Compute (a) the expected present value and (b*) the variance of the present value of 
the money spent by Mr. A until the first time when he does not show up in the next year. 


29. 


30. 


31. 


32. 


33. 


34. 


35. 


36. 


37. 


38. 


7. Exercises 503 


For a certain population, we have 35 p39 = 0.95, and for the rate 
5 = 3%, the APVs : A397 =0.2, Ass = 0.44; while for 
ô= 6%, the APVs : A39=0.09, A65 =0.235. 


(a) For 6 = 3%, compute the actuarial present value of annuities of $100,000 paid at the 
beginning of each year to a person starting from her/his 30-years, until death. Find the 
mean and variance of the present value. 


(b) Do the same if payments are made until death but not longer than 35 years. 


Using the current payment technique, compute @79 as a function of 5 under the condition that 
T (70) has the density fr (x) = 0.005(20 — x) on [0,20]. 

Michael is 20 years old. His parents (or a bank—in this case, it would be a loan) have 
agreed to pay him $20,000 a year during 5 years, so he would be able to get his education. 
For simplicity, assume that the payments are made at the beginning of each year. Using the 


Illustrative Table, compute the net single premium and the corresponding variance for the 
annuities mentioned. 


Using the Illustrative Table and linear interpolation, estimate 29. 


Section 4* 


Let T (50) be uniformly distributed on [0,50], v = 0.96. Find the APV of the 15-year deferred 
continuous annuity paying at an annual rate of 2 for the first 10 years after the beginning of 
payment and at unit rate thereafter but totally not longer than 20 years. 


(a) Explain from an heuristic point of view and show rigorously that 
(Idi)x = Y° dx, and (Iā)x = VY yay. (7.2) 
k=0 k=0 


(b)? Write the series P79 k|āx for the case when T(x) is uniformly distributed on [0,M], 
for an integer M and 5 = 0. Compute this series. 


Consider a continuous annuity on (30) paying at unit rate in the first year, at a rate of 1.2 
during the second year (if the annuitant survives the first year), at a rate of 1.4 during the 
third year, and so on, up to a rate of 5 units. Then the rate is level at 5. The interest rate 
5 = 0.04. Find the net single premium (a) if the lifetime X is uniform on [0, 100]; (b) if T (30) 
is exponential and 39= 40. (c) How would you compute the APV for the corresponding 
annuity-due having data from a life table similar to what we have in the Illustrative Table? 


(This problem is close to [151, N1]). A special 45-year annuity-due on (20) pays 30 for the 
first 5 years, 20 for the next 15 years, and 10 for the last 25 years. Given 29p20, the discount 
v, Gig.45p 49.5) and Ö 40:75) find the APV. 


Write the counterpart of (7.2) for (1@)x. 


Sections 5-6* 


Under the assumption that the lifetime is uniformly distributed within each year of age, write 
the relation between &, and ä,. Make sure that your answer does not contradict (5.5) if we let 
m — œ. How will the relation look for = 0? Explain why the answer should be expected. 


The nice idea of this exercise is taken from [43, Ex.C.4-5.]. 


504 


39. 


40. 


41. 


42. 


43. 


44. 


45. 


46. 


9. ANNUITY MODELS 


In the same spreadsheet you provided in Exercise 11a, compute the APVs for the cases of 
monthly and quarterly payments. 


A participant K. of a pension plan is now exactly 61. His salary increases by 3% every year 
on his birthday, and his annual salary for the age year starting now is $60,000. He has been in 
service for 10 years prior to the present time. For simplicity, we assume that retirement takes 
place on a birthday. There are two causes for the termination of the plan before retirement: 
death and withdrawal. The corresponding probabilities may be estimated from the table 
below. In particular, one can see that the earliest retirement time is K.’s 63rd birthday, and 
the latest is his 65th. 


Let 6 = 0.04. For any further information refer to the Illustrative Table. 


Agex bk dad? a ap 
61 100 1 9 0 
62 90 2 8 0 
63 80 3 7 10 
64 60 4 0 6 
65 50 0 0 50 


(a) The retirement benefit (which is a part of the plan) pays a monthly annuity-due at an 
annual rate of 2% of the last salary times the number years of service at the time of 
retirement. Calculate the APV of the retirement part of the plan. 


(b) The plan includes a death benefit of $5,000 times the number of years in service paid 
at death, provided it occurs while K. is still in service. Calculate the APV of this part 
of the plan and the total APV. 


(c) K. has an offer to move to another company. If he withdraws from the plan now, he 
will receive his own past contributions to the plan + interest, amounting to $12,700. 
K. wants the new employer to compensate him for the loss of value in the pension 
plan. What should this compensation be? Assume for simplicity that K.’s contributions 
during the last 10 years were level and made at the beginning of each year. 


We revisit Example 6.1-3. Assume that the annuity terminates if Harold dies or recovers, 
whichever comes first. However, if Harold attains the age of 65, he will be paid a contin- 
uous life annuity at rate c. Explain why in this case we do not need the assumption of the 
independence of the causes. Find the APV. 


Let us look at (6.2). If u2 = 0, then dy-y = 1/(u; +8). Can it be predicted? What does it 
mean? If u; = u2 = 0, then ay.) = 1 /6. Show that it is obvious and could be obtained without 
considering the special exponential case. Analyze (6.3) in the same manner. 


(a) Let Tı (50) and T2 (50) be independent and uniformly distributed on [0,50], and 6 = 
0.04. Find dso-:59 and a5050: 
(b) Solve the problem if T; (50) is uniform on [0,40], and the other conditions are the same. 


Write a general formula for the deferred continuous annuity paying at unit rate after the first 
death and until the second. 


The lifetimes 7; and T) of a husband and wife are independent and exponentially distributed 
with the same parameter u. A special annuity pays continuously at unit rate during the period 
[T, T2], provided that Tı < T2. Find the APV for a given 8. (Hint: If you use the result of 
Exercise 44, the problem will not require lengthy calculations.) 


Solve Exercise 45 in the case when the forces of mortality for T; and T) are different: u; and 
u2. Make sure that the answer does not contradict the previous. 


Chapter 10 


Premiums and Reserves 


In Chapters 2 and 3, we already considered premium determination; more precisely, pre- 
miums ensuring the solvency of insurance mechanisms. The models of those chapters 
concerned short-term insurance and single premiums; that is, premiums paid just one time 
at the beginning of the period under consideration. 

As has been already mentioned in Section 2.4, in the case of life insurance or future annu- 
ities, for example, pension plans, the policies are based not on single premiums but rather 
on sequences of premium payments to be carried out at certain rates. We will call such 
sequences of payments premium annuities. In this case, when determining a premium rate, 
the company usually proceeds from the total future random loss, which is the difference 
between the payments the company will provide in accordance with the contract and the 
total amount of premiums the company will receive until the contract is terminated. The 
corresponding models are explored in Section 1. 

After we have considered problems concerning premiums, we will proceed—in Section 
2—1o the notion of reserve, which may be defined as the value of the future liability of the 
insurer. 


1 PREMIUM ANNUITIES 
1.1 General principles 


Consider a risk portfolio. Denote by Z the present value of the future payments of the 
company, and by Yp the present value of the total premium to be paid. The subscript P 
indicates that the latter present value depends on the premium rate or the premium rates 
if we deal with many different contracts with different premiums. So far, these rates are 
involved in Yp implicitly, but soon we will write explicit expressions. 

The present value of the total loss of the company is the r.v. 


a Z= Yp. (1.1.1) 


For the reader who took Route 2, note that the ideology of premium principles in this 
case is close to what we discussed in Section 2.4 but we apply these principles to the total 
loss Lp. However, to understand the material below, one does not need to know the material 
of the section mentioned. 


We distinguish the three following approaches. 


505 


506 


10. PREMIUMS AND RESERVES 


A. The equivalence principle. In this case, we require the losses to be equal to zero on 


the average, i.e., 
E{Lp} =0. 
Clearly, it implies 
E{Z} = E{Yp}, (1.1.2) 


which is an equation for P, so far given in an implicit way. Premiums based on this 
principle are called benefit premiums or net premiums (more precisely, net premium 
rates). 


B. The percentile principle. We fix a small probability level y, and require 


P(Lp > 0) <¥. (1.1.3) 


Let Ip = —Lp = Yp — Z, the profit of the portfolio. Then (1.1.3) is equivalent to the relation 
P(Ip < 0) < y, or gy(Ip) > 0, where qy(Ip) is the y-quantile of Jp. Thus, in essence, we deal 
with the VaR criterion; see Section 1.1.2.2. 

Premiums based on (1.1.3) are called percentile premiums. (The words ‘percentile’ and 
‘quantile’ may be considered synonyms; the only difference is that in the former case prob- 
ability is measured in percents.) 


C. The utility equivalence principle. Given a utility function u(x) and an initial surplus 


(or wealth) w corresponding to the portfolio, we compare the expected utility in the 
case of insurance (i.e., E{u(w — Lp)}) and the expected utility in the case where 
the insurance is not carried out (i.e., just u(w) because in this case, the utility is not 
random). We require the former expected utility to be larger than the latter, or in the 
boundary case, 


E{u(w—Lp)} =u(w). (1.1.4) 
In the case u(x) = —e~**, this is equivalent to 
E{exp{BLp}} =1. (1.1.5) 


(See similar calculations in 1.3.2, where we had observed that in the exponential case 
the solution did not depend on w.) In this case, premiums are called exponential. 


Certainly, if in (1.1.5) we set B = 0, then we come to an identity. However, if B # 
0 but is close to zero, the exponential premium will be close to the net premium. 
Heuristically, it is seen immediately because for small B, the function e®* ~ 1 + Bx, 
and substituting this in (1.1.5), we would come to E{Lp} ~ 0. 


To make it more accurate, let us appeal to (1.1.4), and instead of u(x) = —e Bx, 
consider uj (x) = p0 — e-P*), which people often do. This is just a linear transfor- 
mation of the previous utility function, so the solution should be the same (see the 
first property of the EUM criterion in Section 1.3.1.2). On the other hand, u1 (x) — x 
as B — 0. So, in the limit, we deal with the equation E{w — Lp} = w, which again 
leads to E{Lp} =0. In Section 1.5, we consider a particular example. 


1. Premium Annuities 507 


1.2 Benefit premiums: The case of a single risk 


In this section, we follow the equivalence principle and consider level premiums, that is, 
premiums paid at a constant rate. 


1.2.1 Net rate 


Let us consider one contract and denote by Z the present value of the benefit payment. 
Let Y be the present value of the premium annuity (the total premium) in the case where 
the premiums are paid at unit rate. Then, if P is the level premium rate of the contract, the 
present value of the total premium to be paid is Yp = PY. 

Thus, 


Lp=Z-PY, (1.2.1) 
and the requirement E {Lp} = 0 amounts to the relation 0 = E{Z} — PE{Y }, or 


_ E{Z} 


P= EYT (1.2.2) 


The numerator is the APV of a benefit payment, which we explored in Chapter 8. For 
example, for the whole life insurance on (x) with a unit benefit, the APV E{Z} = A, or Ax, 
depending on whether the benefit is payable upon death or at the end of the year of death. 
If the insurance pays not one but, say, $100,000 upon death, then in the right member of 
(1.2.2) we should set E{Z} = 100, 000A,. 

As another example, we may consider a deferred annuity on (x), say, a pension plan 
paying one unit of money at the beginning of each year starting with the age of x +n, 
provided that the annuitant attains this age. Then the numerator in (1.2.2) is E{Z} = n|ax = 
nPx' V"Gx1n; see Section 9.3.3. 

The denominator in (1.2.2) depends on the period of premium payments, which in turn 
depends on the type of the contract. We distinguish the cases of continuous and discrete 
time premium annuities. In the former case, premiums are paid continuously, in the latter— 
at the beginning of each time interval. 

In the case of the deferred annuity above and continuous time, the premiums are paid 
until death or the age x+n, whichever comes first. So, E{Y} = G@,-m. In the discrete time 
model, E{Y} = dix. 

In the case of whole life insurance, the premiums are paid during the whole life, and 
hence E{Y} = ā, in the case of continuous payment, and E{Y} = d, if the premiums are 
paid at the beginning of each year. 

Consider an n-year term life insurance contract. In this case, the premiums are paid only 
the first n years provided that the insured is alive. So, E{Y} = Gym or Gym in the case of 
continuous and discrete payments, respectively. 

The period of premium payments may be shorter than the term of the contract. For 
example, if for an n-year term insurance the premium annuity is provided during the first 
m <n years, the premium will be equal to the ratio Alan / Gym: 

Let us consider some particular cases. 


508 10. PREMIUMS AND RESERVES 


EXAMPLE 1. Consider a whole life insurance on (x) with unit benefit payable at the 
moment of death and the continuous premium annuity. In this case, 


P= 


> 


(1.2.3) 


(For now we keep using P without additional indices and signs as a general notation for 
premiums. Later, for this particular type of benefit premiums, we will use the notation P,.) 

Now, assume that the force of mortality ux(t) = u and the interest rate is 8. Then Ay = 
u/(u +ò), and a, = 1/(u+); see Examples 8.1.1-3, 9.1.1-la. Then (1.2.3) implies P = u, 
the mortality rate. It is noteworthy that in this case the premium does not depend on 6. 


EXAMPLE 2. Let v = 0.96 and x = 60, and suppose the insured comes from the group for 
which l6o = 91748, lo, = 90491, l62 = 89142, lez = 87698, loa = 86165, and l65s = 84518. 
Find the premium rate for a 5-year term life insurance with a benefit of $100,000. The 
premiums are paid at the beginning of each year until the contract termination. Thus, the 
rate is 


1 
100,000 x feos, 
460:5] 

For the above data, in Example 9.3.2-1, we have obtained digy.5, © 4.50 (by providing fairly 
lengthy calculations). 

In Example 8.2.3.2-2, we computed (applying a recursive procedure) A3 zx 0.069. 

Hence, for a unit benefit, P ~ (0.069/4.50) ~ 0.0153, and the rate is 0.0153 x 100,000 = 
1,530. Thus, for a death benefit of 100,000, the insured will pay at most 1,530 x 5 = 7,650. 
This is not surprising: the term is short and the probability that the company will pay 
nothing is (65/160) 0.92. 


EXAMPLE 3 ({151, N5]'). For a special decreasing 15-year term life insurance on 
a person aged 30, you are given: mortality follows De Moivre’s law with œ = 100; the 
benefit payment is 2000 for the first 10 years and 1000 for the last 5 years; the death benefit 
is payable at the end of the year of death; v = 0.95. Calculate the level annual premium. 

The remaining lifetime T is uniform on [0,70], and the curtate lifetime K takes on values 
0, ...,69 with equal probabilities 1/70. The insurance may be represented as the sum of the 
following two insurances: a 10-year term insurance with a benefit of 1000, and a 15-year 
term insurance with the same benefit. 

The APV for the first insurance is 


2 1 1000 1—+»!? 
k+1 nS 
1000- } y“ Fae EE x 108.914. 


k=0 


For the second, it is 


14 1 1000 1—y!5 
kit a 
1000- } v 70 70 v = ~ 145.678. 


k=0 


TReprinted with permission of the Casualty Actuarial Society. 


1. Premium Annuities 509 


The APV for the premium annuity-due may be computed by the formula äx:m = ya vk Dx 
[see (9.3.2.5)]. In our case, this amounts to 
14 
$ (1 —k/70)v* ~ 9.806. 
k=0 


(One may use formula (9.4.4) or just software.) Hence, 


_ 108.914 + 145.678 
~ 9.806 


Route 1 = page 509 


EXAMPLE 4. Consider two independent lifetimes T, = T(x) and T) = T (y) with con- 
stant forces of mortality u; = 0.02 and po = 0.04, respectively. Let = 0.06. In the 
continuous time case, find the premium rate for the following insurances with unit benefit. 


x 25.963. 


(a) The benefit is payable at the moment of the first death. The force of mortality for 
T = min{T,7>} is u = u +42 = 0.06, and in accordance with what we have obtained in 
Example 1, the premium P = u = 0.06. 


(b) The benefit is payable at the moment of the second death. Using again the formula 
from Example 8.1.1-3, we have Ay = u1 / (u1 +8) = 0.25. Similarly, Ay = 0.4. Now, Axy = 
u/(u +8); so, Axy = 0.5. By (8.4.2.2), Ags =Ax +Ay —Axy = 0.15. Then the premium 
annuity de = (1 —Axy)/ð = 14.16... . Thus, 


Axy 
P = = x 0.0106. 
axy 
(c) The benefit is payable at the moment of the second death, while the premiums are 
paid until the first. In this case, 


Using now the result of Example 9.1.1-1a, we have dy: = 1/(u+6) = 25/3. Consequently, 


P= 33% = 0.018. 


On notation. The traditional notation for premiums may be clarified by the following 
examples, where the expression in the parentheses indicates the type of insurance. 


(a) P(A). A whole life insurance with unit benefit payable at the end of the year of death 
and premiums paid continuously. 

(b) P(A,). A whole life insurance with unit benefit payable at the time of death and 
premiums paid at the beginning of each year. 

(c) mP(Ay). An m-year payment whole life insurance with unit benefit payable at the 
time of death and premiums paid at the beginning of each year but not more than m times. 

(d) P(mjax). An m-year deferred annuity with premiums paid at the beginning of each 


year but not more than m times. 


e 


510 10. PREMIUMS AND RESERVES 


(Certainly, in all examples above—and all below—premiums are not paid after the in- 
sured dies.) 

In the case when an insurance and the corresponding premium annuity are either both 
considered in continuous time, or both in discrete, we can simplify notation by denoting 
premiums in the following manner. 


(a) P,. A whole life insurance with unit benefit payable at the end of the year of death 
and premiums paid at the beginning of each year. 

(b) P,. A whole life insurance with unit benefit payable at the time of death and premiums 
paid continuously. 

(C) mPa. A whole life insurance with unit benefit payable at the end of the year of death 
and premiums paid at the beginning of each year, but not more than m times. 

(d) Pi. An n-year term insurance with unit benefit payable at the end of the year of 
death with premiums paid at the beginning of each year. 


In conclusion, we systematize some typical cases in the table below. For any insurance, 
we consider the cases of continuous and discrete payments in the same row. The bar over 
P indicates that the premiums are paid continuously. For the whole life insurance, we 
demonstrate the difference between benefits payable upon death and at the end of the year 
of death. For other cases, we consider, for brevity, one type. For the table to look simpler, 
we suppress the notations for premiums mentioned above. 


TABLE 1. 


Whole life insurance with a benefit payable 


at the moment of death P =Ax/āx and P =A,/dx 


Whole life insurance with a benefit payable 


at the end of the year of death P=A;/a, and P=A;/dy 
n-Year term life insurance with a benefit AR. ipera 
payable at the moment of death SAxm/äxm and P =Ayy/ds:m 


n-Year endowment life insurance 


with a benefit payable at the moment of death P=Axn/Gem and P =Axm/äxm 


m-Year payment whole life insurance 


with a benefit payable at the moment of death P =A, /āxm and P =Ax/äxm 


m-Year payment n-year term life insurance 


a. = re s 
with a benefit payable at the moment of death P=Ayq/Gxm and P =A, m/äxm 


m-Year payment n-year term endowment 
life insurance with a benefit payable P =Ayq/Gem and P =Ay.q/aem 
at the moment of death 


n-Year pure endowment life insurance P= „Ex /āxm and P = „Ex /äxm 
n-Year deferred whole life annuity-due P = „Ex: Gxtn /āxm and 
P= nEx -Ay+n [äxm 


1. Premium Annuities 511 


1.2.2 The case where “Y is consistent with Z” 


We saw in Section 9.2.2 that each type of level annuity corresponds to a certain type of 
insurance, which is reflected by the formulas 


_1-Zy 


{=z 
y= = y 


and Y 


for the cases of continuous and discrete annuities, respectively. Unlike in the formulas 
(9.2.2.1) and (9.2.2.5), here we mark Z by the index Y to avoid confusion: the insurance 
that corresponds to Y may not coincide with the insurance whose present value is denoted 
by Z in the formulas (1.2.1) and (1.2.2) above. In other words, in general, Zy Æ Z. 

For example, for an n-year term insurance, the premium annuity paid during the term 
[0,n] corresponds to an n-year endowment insurance rather than to an n-year term in- 
surance; see Section 9.3.2. In particular, in the continuous case, E{Z} =A ; 
E{Zy} =Ayn. 

Also, for Z to be equal to Zy, the premium annuity and the insurance should both corre- 
spond to the same model with regard to time: either the continuous or discrete time model. 
We will call such cases fully continuous time and fully discrete time, respectively. 

So, we will talk about consistency between Z and Y if Z = Zy. Below, we consider four 
such cases. 

First, this are two types of whole life insurance: fully continuous and fully discrete. In 
these cases, E{Z} = E{Zy} = A, and E{Z} = E{Zy} = Ax, respectively. 

The next two cases are two types of n-year endowment insurance: fully continuous 
and fully discrete. In these cases, the respective relations are E{Z} = E{Zy} = Axm and 
E{Z} = E{Zy} = Axm. 

When dealing with a consistency model, we may simplify some representations. 

Let the random time moment ¥ play the same role as it played in Chapters 8 and 9. 
For example, in the continuous-time scheme, for a whole life insurance ¥ = T(x). For an 


n-year endowment insurance, ¥ = min{7,n}. Then, in the case of consistency, 


«m While 


1-Z 1-Z 
Z =e” andY = Any EN 


(See Sections 8.1.1, 9.2.2.) 
Then the loss 


1-e™® P\ P 
Lp =Z—PY =p =e (1+5) DE 


Writing P instead of P to emphasize that we are considering the fully continuous time case, 
we have 


Lp=e* (1-5) -5=z(1+5) -5 (1.2.4) 


We will use both representations in (1.2.4), whichever is more convenient. 
Similarly, in the fully discrete time case, 


=v (147)-2= (1+3)-2 (1.2.5) 


512 10. PREMIUMS AND RESERVES 


Above, the premium rates P or P were arbitrary. Let us consider net rates. In the consistency 
case, we can make use of the relations 


1—A 1—A 
7 or a= a (1.2.6) 


(The indices are skipped since it may concern different types of insurance.) The general 
representations for the premium in this case are the following: 


A A dA _ A A 5A 
— = , and P=-= = = —. (1.2.7) 
ä (1—A)/d 1-A a (1—A)/ð 1—A 
Cases in point are presented in the table below. 
TABLE 2. 
Whole life insurance with a benefit payable P= õAx 
at the moment of death 1—4, 
Whole life insurance with a benefit payable p= dA, 
at the end of the year of death 1—A, 
n-Year endowment life insurance P= Axm 
with a benefit payable at the moment of death 1—Axm 
n-Year endowment life insurance Whe 
with a benefit payable at the end of the year | P = i A ! 
of death Tn 


EXAMPLE 1. Consider the 5-year endowment insurance from Example 8.2.4.2-1. For 
the data from this example, Ago.5) = 0.820. Then we can immediately write that 


dAg5, _ (1—0.96)-0.820 
P = P3 = Ia = 0.182. 
6051 1 — Agga| 1 — 0.820 ve 


1.2.3 Variances 


If Y and Z are consistent in the sense defined above, we can use (1.2.4) and (1.2.5), which 
implies that 


2 


Var{Lp} = (1 +E) variz) and Var{Lp} = (1 +7) Var{Z} (1.2.8) 


for the cases of continuous and discrete time, respectively. The variance Var{Z} may be 
computed by the double rate rule; see Chapter 8. It is worth emphasizing that formulas 
(1.2.8) are true for any premium, not only for the net one. 

In the case of the net premium, combining (1.2.6) and (1.2.7), we can write that 


tee : d 1 2 
= SS SS a { = : 
ô 1-A 6a d 1—A dé 


1. Premium Annuities 513 


Thus, in the case of the benefit (net) eh we can rewrite (1.2.8) as 


Var{Lp} = —s3Var{Z} = ———,Var{Z} in the continuous case; (1.2.9) 


(6 J (= 7 
1 

aay (1-4)? 

EXAMPLE 1. Let us continue the previous Example 1.2.2-1, where we have computed 
(using certain data) that the net premium P = P¢o.5, = 0.182.... In rampie 8.2.4.2-1, we got 
that for the insurance under consideration, Aco: 3 = 0.820 and Var{Z} = "A63 (Aoa) © 
0.0014. Hence, by (1.2.10), Var{Lp} ~ T0. iost 0014 ~ 0.043. So, we deal with the loss 
L with zero mean and the standard deganni Oo, & 0.208. 


Var{Lp} = Var{Z} = Var{Z} in the discrete case. (1.2.10) 


EXAMPLE 2. Consider the whole life insurance in the case of continuous time and a 
constant force of mortality u. We saw in Example 1.2.1-1 that, for such an insurance, the 


. u u 
net premium P = u. In Example 8.1.1-3, we came to Var{Z} = = 
p H p {Z} 1425 (5) 
us? 


A Then, by (1.2.8), 


HY? us? ou 
G (1 | 5) (u+28)(u +8)? +28" 
Let, say, u = 0.06, and 6 = 0.06. Then Var{Lp} = 1/3, and the standard deviation oz © 
0.57. So, if the benefit equals $1000, then the loss deviates from zero on the average 
approximately by $600. It is certainly too large but we should remember that we consider a 
net premium. The situation may change dramatically if we add a sufficient security loading, 
which we will see in Section 1.4.2. 


Route 1 = page 517 


In the general case, computation of variances is more difficult. 

EXAMPLE 3. Consider an n-year term insurance on (x) with unit benefit payable at 
the moment of death. For simplicity, assume that the premium is paid continuously. Set 
T =T (x). Since the insurance that corresponds to the premium annuity is the n-year en- 
dowment, the present values 


—ôT ; —òôT ; 
e ifT <n e ifT <n 
Z= roe d Zy= ae 
T isn. eea es if T >n, 


and they are not the same. Set 


z= 0 iff <n, 
2 e “if T >n. 


Then Zy = Z + Z4, and 
1-Z 1-Z-Z 
Y 1 =2(1 


Lp=Z—P =Z—P 


514 10. PREMIUMS AND RESERVES 
Consequently, 


P\? P\? P\P 
Var{Lp} = (1+ 5 Var{Z}+ 5 Var{Z;}+2[(1+ 3/3 Cov{Z,Z,}. (1.2.11) 
Since Z-Z, = 0, 
Cov{Z, Z1} = —E{Z}E{Zi} = Alp: nEs, (1.2.12) 


while 
Var{Z} = °Ala (Alm), Var{Z1} =v” npx(1— npr). (1.2.13) 


Combining the last three relations, we can compute the variance for any P. 


1.2.4 Premiums paid m times a year 


The premiums we now consider are called true fractional and concern the case when 
they are paid by m installments of equal size at the beginning of each m-thly period. The 
annual premium rate is denoted by P™) (with other indices if needed). In other words, 
the premium paid at the beginning of each “m-th” is equal to p) /m. We do not suppose, 
however, any adjustment in paying benefits. 

In this case, in the general formula (1.2.2), we should hold the same numerator, while 
E{Y } is the corresponding APV of the type a™) that we considered in Section 9.5. 


EXAMPLE 1. Consider a whole life insurance on (x) with benefits payable upon death, 
while premiums are paid m-thly. Then the benefit annual true fractional premium is 
pi”) = Ay / aie”) . [The complete traditional notation for such a premium is pi) (Ax).] 

Let us compare it with the annual rate when payments are provided one time a year, that 
is, with P = A, / äx. [The complete notation for such a premium would be P(Ax).] 

In Section 9.5, we showed that ai”) = o.(m)da, —B(m), where a(m) and B(m) are defined 
in (9.5.6), and 


alm) = 1, B(m) x —. (1.2.14) 


Hence, 


Say, for m = 12, we would have 


1 


P! — kP, where the coefficient k ~ T—11/(24a,)" 


(1.2.15) 


Since d, > 1 (why?), the maximal value of the “correction” coefficient k is not larger than 
miu = 1.85, but this is not an accurate bound. As a rule, d, is essentially larger than 
1, which we saw in many examples. For instance, for d, = 5 we would have k ~ 1.1, and 
Gx = 10 would lead to k ~ 1.05. In any case, if the insured pays monthly, she/he should pay 
more. See also Exercise 24. 


1. Premium Annuities 515 


1.2.5 Combinations of insurances 


In practice, one can see a large number of different insurance forms—in particular, var- 
ious combinations of classical insurances we considered above. Nevertheless, the rule for 
net premium calculations for all these policies is the same. One should compute the APV 
of what the company pays, i.e., E{Z}, and the APV of the premium annuity, i.e., E{Yp}. 
Then the equality E{Z} = E{Yp} will be an equation for the premium. 


EXAMPLE 1. (a) An n-year deferred whole life annuity is issued to (x) with the pro- 
vision for unit death benefit in the case of death before n. For the premium annuity, we have 
E{Yp}=Päxm, while E{Z}= när +A! =V" nPrdxtntAgm- Thus, P= (nExäxyn+Alm) / äxm- 
Certainly, we could also consider it as if the insured bought the life insurance and the de- 
ferred annuity separately. In this case, we would compute the two premiums separately and 
would add them up, which would take a bit longer. 


(b) Now, in the same deferred annuity plan, let the death benefit (in the case of death 
before n) consist in the return of the accumulated premium with interest. For the APV of 
the premium annuity, we write the same as above: E{Yp} = Päx:m. 

The APV of the return may be computed directly, but one may reason as follows. If the 
insurer had returned the premium in any case, the APV would have been equal to Päx:m 
that is, exactly to the APV of what the insured paid. As a matter of fact, the insurer does 
not return the premium if the insured attains the age of x+n. On the other hand, the present 
value of the series of unit payments until time n is äm =1+...+v"!= a — v"). (See 
also Section 0.8.3.) Hence, the expected present value of the part of the premium, which 
the insurer does not return, is P - „px äm, and the APV of the return is P(äxm— npx' äm). 

Thus, E{Z} = v” nPxäx+n + P(äxm— npx'äm). Setting E{Yp} = E{Z}, we readily com- 
pute that 


n ss iss 
P= V nPxûx+n _ V x+n 
nPx* äm Gin 


(1.2.16) 


See also Exercise 25. 


(c) (A similar example is contained in [42, 5.6].) Let us consider the case when the 
company returns the premium without interest. This means that if K = k < n, the company 
pays P(k+1) units at the end of the year of death. This amounts to an n-year term regularly 
increasing insurance with the APV per unit of the premium equal to (JA) a We considered 
the precise formula for (JA) }.z, in Exercise 8.45. 


Thus, Päxm = V” nPxdicin + P(IA)L, and 


KAP 


— nExäx+n 
äxm— (IA)! 


x:N] 


In conclusion, it makes sense again to emphasize that the premiums we considered above 
are net premiums. To take into account the risk that the insurer incurs, a security loading 
should be incorporated into the premium. An explicit (and already known to us) way to do 
that will be discussed in the next section. One implicit way is to compute E{Z} and E{Y} 
in the representation P = E{Z}/E{Y} with different interest rates. Taking a larger rate 


516 10. PREMIUMS AND RESERVES 


for the premium annuity (in comparison with the rate for the insurance part), we make the 
denominator smaller, and hence the premium larger. This is similar to what we can see in 
any bank: the loan rate is higher than the deposit rate, i.e., the rate of return on customers’ 
investments. 


1.3 Accumulated values 


In this section, we generalize the notion of accumulated value from Section 0.8.4. The 
characteristic at which we will arrive may be interpreted as the expected accumulated value 
of a temporary annuity, given that the annuitant survives the term of the annuity. To explain 
what it means rigorously, we use the notion of net premium, and this is why we consider 
accumulated value in the current chapter concerning premiums. 

We begin with an illustrative but somewhat non-rigorous way of reasoning. Assume that 
an individual aged x invests in a fund (for example, a pension fund) a unit of money at the 
beginning of each year during n years or until death. If Y is the present value of this series 
of payments, then, as we know, E{Y} = äx:m. The value of the same investment from the 
standpoint of the time t =n is S$ = v "Y (see also Section 0.8.4). If the future lifetime T of 
the investor were larger n with probability one, then Y would have been equal to äm and S 
would have been equal to v "äm = ïm, in accordance with (0.8.4.2). 

However, T may be less than n, and the expected value of S is 


E{S} = EAY} À Tien 

Suppose that the fund consists of the contributions of k independent individuals of the 
same type—namely, all investors begin to invest at the same time, having the same age x 
and the same distribution of the future lifetimes. Let S; be the accumulated value for the ith 
investor at time t = n, and let W = S1 + ... + Sk. 

At the end of the term, the total accumulated sum W will be distributed between the 
investors who attain the age of x+n. So, an investor who survives the term will not get 
W /k, but rather W /Ng, where N; is the number of investors who will be alive at the end of 
the term. 

Thus, the amount obtained per survivor is 


f= Spt... +S, 

Nx f 

Because E{N,} = k- npx, if we replace the denominator and the numerator in (1.3.17) by 
their expected values, we will come to the quantity 


(1.3.17) 


kv Gem ne äm 


k- nPx v” +n Dx 
The last characteristic is denoted by 5. It is called the actuarial accumulated value at 
the end of the term of an n-year temporary annuity-due and is interpreted as the expected 
accumulated value that is available if the investor survives. 
However, the expected value of the ratio of r.v.’s is not equal to the ratio of the expected 
values of these r.v.’s. Therefore, the characteristic at which we arrived is not equal to E {s ie 
and the interpretation above remains unclear. 


1. Premium Annuities 517 


To make it more understandable (and more rigorous), let us reason in a slightly different 
way. Let a r.v. 1 be what a particular investor will get at the end of the term, and let T be 
the investor’s future lifetime. Since n = 0 for T < n, we write 


E{n} =E{n|T > n}P(T >n)+0-P(T <n) =E{n|T>n}- aps. (13-18) 


The present value of n is v’n. Then the present value of the investor’s loss is the r.v. 
L =Y — v”n, where Y is the present value of what the investor will pay by time n. 

Let us now view the unit investment paid at the beginning of each year as an annual 
premium paid in order to get the benefit n at the end of the term. Assume that the benefit 
n corresponds to the case when the unit premiums above are benefit premiums. In other 
words, we define y as the benefit such that E{L} = 0 in the case of unit premiums. Then 
we may write 


0 = E{L} =E{Y} -—VE{n} =Gem—v"E{N|T > n}npr- 


Hence, 7 
o Am _. 
E{y|T >n} = PD. = Sx 


So, the quantity 5m is E{n|T > n}, where y is defined as above. Thus, 5',.q is indeed 
the expected value of what the investor will get if she/he survives but in the case of benefit 
premiums. 


1.4 Percentile premiums 
1.4.1 The case of a single risk 


In this section, we show that in the case of a single contract, the percentile premium 
principle may lead to results contradicting common sense. Namely, this principle may 
determine the same premium for two insurances among which one deals with larger losses 
than the other. 

We consider particular examples below but it is important to understand that the existence 
of such examples is by no means surprising. We noted already that the percentile principle 
coincides, in essence, with the VaR (or quantile) criterion considered in Section 1.1.2.2. 
We have shown in Example 1.1.2.2-2 that the function qg,(X) is not strictly monotone in 
the sense that there exist pairs of r.v.’s X and Y such that P(X > Y) = 1, P(X >Y) >0, but 
nX) = 4y(Y). 

To make it obvious, consider one more example. 


EXAMPLE 1. Let r.v.’s X and Y be such that P(X > Y) = 
1, and their d.f.’s are shown in Fig.1. The graph of Fy (x) 
lies under the graph of Fy(x) since P(X<x) < P(Y<x). 
Clearly, P(X >Y) > 0, since otherwise P(X = Y) = 1, and 
Fx (x) would coincide with Fy (x). However, for the y cho- 
sen in Fig.1, gy(X) = qy). 


A more detailed discussion may be found in Section 
1.1.2.2. 


FIGURE 1. 


518 10. PREMIUMS AND RESERVES 


Thiz: = 


: = = = the d.f. of the loss 
for the n-year endowment 
insurance 


© = the d.f. of the loss 
for the n-year term 
insurance 


FIGURE 2. 


The next example is similar to Example 1 and concerns some actual forms of insurance. 


EXAMPLE 2’. For a (remaining) lifetime T = T(x), consider an n-year term and an n- 
year endowment insurance on the fully continuous basis (for benefit and premium payments 
as well). For our example to be non-trivial, assume that P(T >n) > 0. 

For the premiums below, we use the notation 7% instead of P, in order to avoid confusion 
with the probability symbol P. 

Set 6 =0. This does not change the essence of the matter but will make the example 
more illustrative. In this case, the present value of a unit payment at time ¢ is one, and the 
present value of the premium accumulated by time t is 7t. 

Denote the losses for the insurances under consideration by LP and i respectively. 
For the n-year term insurance, we have 


Ly by Ts 


a i < 
N nT if T<n, (14.1) 


O-—amn if T >n. 


The endowment insurance coincides with the term insurance if T < n, and it pays one unit 
when the insured attains the age of x+n. Thus, 


(2) _ (2) _ji-ar if fan, 
Ir =L oo ee if T >n. 
The particular forms of these functions are not important for us (for 6 > 0 they will be more 
complicated), but what is important is that io (T) = Lo) (T) for T <n. 

If T >n, then LOT) takes only one value c} = —7n, and Le) (T) takes only one value 
c2 = 1 — Tn > c1. Thus, 


ID >O and P (Ey > Lt’) =P(T >n)>0. (1.4.2) 


The d.f.’s of the r.v.’s Lo (T) and i?) (T) are completely determined by the distribution of 
T; the typical graphs are shown in Fig.2. These graphs reflect the following facts: 

(i) L (0) = LO (0) = 1; Gi) the d.f.’s make jumps of s = P(T > n) at the points cı and 
c2, respectively; (iii) for x > co, the d.f.’s coincide. 


2This example is based on the idea of Examples 6.2.3-4 in [19]. 


1. Premium Annuities 519 


The requirement P(LP (T) > 0) < yis equivalent to the inequality P(LP (T)<0)>g, 
where g = 1 — y. We see from the graphs that, if g > s, then the g-quantiles for both r.v.’s are 
the same. This means that for such g’s (or Y's), the VaR approach identifies both insurances 
under consideration. 

In light of (1.4.2), this contradicts common sense and says about the essential non- 
flexibility of the percentile approach. 

> Let us state it in more detail. Consider any y € (0, 1 — s) and denote by ty the y-quantile 
of T. Since y < 1 —s = 1-— P(T >n) = P(T < n), for the y chosen we have ty < n. 


From (1.4.1) it follows that for LO? (T) > 0 we should have T < n. In this case, LP (T)= 
1—7T, and hence the event {LP (T) >0}={T < 1/7, T <n}. Thus, PILP (T) >0)<y 
if and only if P(T < 1/m, T <n) < y. Since ty < n, this is equivalent to 1< ty, Or 


T > Ty, where Ty = 1/ty. 


Thus, 7y is the minimal acceptable premium for the insurer. 


Because Lo (t)= Lo) (t) fort < n, and we have chosen ty < n, the same Ty is the minimal 


acceptable premium for the second insurance. < 


1.4.2 The case of many risks. Normal approximation 


The situation changes dramatically when we consider a portfolio of many risks. In this 
case, the total loss is the sum of the losses coming from separate contracts, and under mild 
conditions, for a large number of contracts, the distribution of the total loss is asymptoti- 
cally normal. As we saw in Section 1.1.2.5, when dealing only with normal r.v.’s, we cannot 
build examples similar to those from the previous section. So, for normal r.v.’s, “everything 
is fine”, and we can use quantiles as a criterion. 

The asymptotic normality of sums of r.v.’s takes place for a large class of dependencies 
between separate terms, but we will restrict ourselves to the independence case. 

Consider a portfolio of n independent contracts of the same type, denoting by LO, 
i = 1,...,n, the loss corresponding to the ith contract. By assumption, the r.v. LČ are iid. 
Then, it is natural to assume that the premium rate is the same for all contracts. We will 
consider the continuous and discrete time cases simultaneously and denote the premium 
rate for both cases by m. Then 

L® = Zi —nY;, (1.4.3) 


where Z; and Y; are the present values of the benefit payments and the premium annuity 
with unit rate, respectively, for the ith contract. 

We will consider the case where Y; is consistent with Z; as it was defined in Section 1.2.2. 
In particular, this means that while the pairs (Z;,Y;) are independent (as was assumed), 
but “inside” each pair, for fixed i, the r.v.’s Z;,Y; are strongly dependent. More precisely, 
Y; = (1 —Z,)/6 or Y; = (1 —Z;)/d in the continuous and discrete time case, respectively. 
From these relations, it follows that 

LY =Z,(1+2)-5 or 1 =7,(142)—5, (1.4.4) 
depending on whether we consider the continuous or discrete time case. Below, we will 
use the symbol 6 for both cases when it does not cause confusion. 


520 10. PREMIUMS AND RESERVES 


Let us consider a y-percentile premium; that is, let us set L = Ln = LD +...+L™, and 
impose the condition 
P(E, > 0) <7. (1.4.5) 


In order to use normal approximation, set m = E{L}, o? = Var{L}, and 
Ln — mn 
C= 
n oyn , 


i.e., L% is a normalized sum. Formally, we do not require the mean loss m to be negative. 
However, we should expect that this will be the case if we choose the premium large enough 
for the insurance to be profitable with a large probability. 

Furthermore, 


P(L, > 0) = p (Heze > 2m) =P (1; > “) 


Then, for (1.4.5) to hold, we should have P (x; <— = > 1 — y. Therefore, up to the 


accuracy of normal approximation, we should have = > Gi-y,s, Where q1—y,s is the 
(1 —y)-quantile of the standard normal distribution. Eventually, we write it as 
—ys O 
NS q1-—y,s , 
yn 
From (1.4.3) it follows that m = A — na, where A and a are the respective expectations 


E{Z;} and E{Y;}. Note that in our consistency case, a = (1 — A) /ò. 
From (1.4.4) we get that o = H (1 + 5), where H? = Var{Z;}. Then (1.4.6) may be 


rewritten as a 
1-y,s T 
a (1 | J: 


Vn 5 


Solving this inequality with respect to 7, we obtain that 


(1.4.6) 


A+ “ast 
1 > T= a (1.4.7) 
a— "Si 


provided that the denominator above is positive. 
By construction, Ty is the minimal acceptable y-percentile premium. 
Now, note that the benefit (net) premium is 


Tnet = — 
Let us look at (1.4.7). If Y < 5 then the standard normal quantile q1—y,s > 0, and for H > 0, 


A 
Thy > — = Tret- 
a 


1. Premium Annuities 521 


We see also that 
Ty —> Tnet aS noo, 


Both facts are quite natural and expected. 
For the discrete time case, we should replace ò by d in (1.4.7). 


EXAMPLE 1. We continue with Example 1.2.3-1 which was the last in the series of 
examples using the same data. In this example, we consider a 5-year endowment insurance 
for which the premium annuity is consistent with the insurance benefits. Time is discrete. 

So far we know that, in the notation of this section, A = Ago] = 0.820, Tnet = Po0:3] = 
0.182, and H? = Var{Z} = 0.0014 (all with the accuracy chosen). So, H ~ 0.037. 

The particular values above were obtained for v = 0.96. Hence, d = 0.04 and a = (1 — 
A) 4.5. 

Consider n = 100 independent contracts of the same type. First, let y= 0.05. Then 
di-y,s © 1.64, and by (1.4.7), 


_ 0.82 + 1.64 -0.037/./100 
4.5 — 1.64-0.037/(0.04,/100) 


y ~ 0.190, 


which is 0.008 larger than Tnet = 0.182. For y= 0.01 we should replace 1.64 above by 
90.99,s © 2.33, which leads to 11.9; ~ 0.193. 

In our example, the percentile premium is close to the net premium, not only because n 
is large but also since the standard deviation H is small. We noticed earlier that for the data 
we use, the probability that the client dies within the term is small. Therefore, the insurance 
risk is not large. In the next example, the situation will be somewhat different. 


EXAMPLE 2. Consider the whole life insurance on a fully continuous basis with the 
constant force of mortality u = 0.04 and 6 = 0.06. In this case, A =Ax = u/ (u+ 8) = 0.4, 
2 
1 ; u u 
= —— = 10, and Tnet = u = 0.04. The variance H? = = 0.25 — 
O u+ô oer +28 (ts) 
0.4° = 0.09, and H = 0.3. 


For y = 0.05, we have 


a n OA $104:0.3/ Vn _ _ 0.4 +0.492/y/n 
Y~ 10—1.64-0.3/(0.06/n)  10=8.2/yn ` 


It is interesting how Ty is changing in n, and how it is approaching Tnet = 0.04. The table 
below illustrates the pattern: 


n |100 |150 
Ty | 0.0490 | 0.0472 


200 1000 |4000 | 10000 
0.0462 | 0.0427 | 0.0413 | 0.0408 


Actually, the convergence is slow. 


The end of Route 1! Route2 = page 523 


522 10. PREMIUMS AND RESERVES 


1.5 Exponential premiums 


The exponential premium is a premium based on the utility equivalence principle with an 
exponential utility function; see Section 1.1. Since this function is concave, the premium 
implicitly involves security loading. Consequently, such a premium should be larger than 
the net premium. However, unlike the percentile premiums considered in the previous 
section, exponential premiums turn out to be too large. 


EXAMPLE 1. Consider a fully continuous whole life insurance on (x) with a constant 
force of mortality u and interest rate 6 = 0. Certainly, such an example is artificial due 
to both conditions above, but it well illustrates the essence of the matter. The same phe- 
nomenon takes place in many much more realistic situations. 

Let the death benefit be equal to C. Then, since 5 = 0, the loss Lp = C — PT, where P is 
a rate of continuous payment, and T = T (x) is the future lifetime. Then the net premium 


C 
Peas, 
nt EIT} 


Consider the exponential premium. In our case, the left member of (1.1.5) is 
E{exp{BLp}} = Efexp{B(C—PT}} =e E{e #7} = eM (-BP), 
where Mr (z) is the m.g.f. of the exponential r.v. T. Since Mr(z) = 1/(1 — z/u), 


1 


E{exp{BLp}} = oN Pan 


Then the solution to the equation E{exp{BLp}} = 1 is 


P= (e"-1). (1.5.1) 


Because E{T } = 1/u, we can rewrite (1.5.1) as 


P= SEY (e%-1) = Poe (e° — 1) = Paer k(BC), 


where the function i 
k(z)= = (e—1) 
Z 
may be viewed as the security loading. 

We see that the premium is not proportional to the sum insured, and it grows rapidly 
as C is increasing. If B — 0, then k(BC) — 1, and we come to the net premium, which 
has already been discussed at the end of Section 1.1. However, for a large B and/or C, the 
security loading may be very high. Say, k(2) ~ 3.19, so the premium should consist of 
more than 300% of the net premium. 

The circumstance mentioned imposes a serious limitation on the application of the expo- 
nential approach. 

One argument in favor of the exponential principle is that it may serve not for deter- 
mining a premium for each particular case, but rather for figuring out a level after which 


is 


2. Reserves 523 


the company should reinsure the risk. For example, H.U.Gerber in [43, p.51] writes that 
the issue we discuss “may be resolved by the following consideration: Assume that the 
insurer charges 250% of the net premium for all values of C: then the policies with a sum 
insured exceeding [the level corresponding to 250%] require reinsurance: policies with a 
lower sum insured are overcharged, which compensates for the relatively high fixed costs 
of these policies.” 


In any case, in practice the most important and most frequently used approach is based 
on the equivalence principle. The net premium in this case serves as the starting point 
in premium calculations. A safety loading is either added to the net premium explicitly 
(for example, proceeding from a security level, which we did for large portfolios in Section 
1.4.2), or it is incorporated implicitly by varying mortality and interest rates (see the remark 
at the end of Section 1.2.5 and Exercise 20). 


2 RESERVES 
2.1 Definitions and preliminary remarks 


Consider an insurance on (x) with a level premium rate P. Denote by E, the event that 
the policy is still in force at time ¢ after the time of policy issue. For example, for the whole 
life insurance, E, = {T (x) > t}. 

Our next step is to consider the company’s future loss not only at t = 0, but at later time 
moments as well. Denote by ;L = ;Lp the present value of the future loss of the company 
from the standpoint of time t, given that E, occurred. The index P indicates that the loss so 
defined depends on the premium chosen at the time of policy issue. When it cannot cause 
misunderstanding, we will skip this index. The quantity 


tV = VP = E{,Lp| cane 


is said to be the reserve at time t. This is the expected value of what the company needs 
in order to fulfil its obligations after time t. For example, in the case of fully continuous 


whole life insurance, ọoVp =A, — Pd,, while 


1Vp =Ax+t — Pay. (2.1.1) 


(Since E, has occurred, the insured has attained the age of x +t. Consequently, when com- 
puting the future loss with respect to the time t, we do not take into account the premium 
amount received before this time. So to say, everything starts as from the very beginning, 
and we may view x +t as the initial age.) 

The counterpart of (2.1.1) for the discrete time case is 


kVp = Ax+k = Paix +k. (2.1.2) 


Sometimes we will omit the index P in ,Vp too. 


524 10. PREMIUMS AND RESERVES 


Set Lp = oLp, Vp = oVp. Since P( Eo) = 1, the conditional expectation E{Lp| Eo} 
equals the unconditional expectation E {Lp}. 
The benefit premium Phen is determined by the condition 


VP ben E E{Lp,., } — 0, 


that is, we require the expected future loss with respect to the initial time to be zero. How- 
ever, we should not expect this property to hold at all time moments. In other words, we 
should not expect that ;Vp,,, equals 0 for all t. 

Let us set +Vben = +Va,,,, the conditional expected reserve at time ¢ corresponding to the 
benefit premium. We call such a reserve a benefit reserve or a net premium reserve. 

As an example, let us consider the case (2.1.1). The benefit premium for this insurance 
is Phen = Py = Ax/ d,, and hence 


tVben =Axytt —A, 3 Gil Os 


Replacing A, by 1 — ôā,, and A,4,; by 1 — 8a,4;, after very simple algebra we get that 


tVben = 1 — n G (2.1.3) 


dx 


Because it is usually the case (though not always (!), see Exercises 9.4, 8.1b) that G44 < 
G,., in typical situations 
tVoen > O fort > 0. (2.1.4) 


This is a desirable property, and it reflects the essence of the matter. In the case (2.1.4), the 
expected discounted loss of the insurer is positive, and hence the expected discounted loss 
of the insured is negative. Thus, on the average, the insured will get more than she/he will 
pay, which is a ground for the insured not to terminate the contract. 

The traditional notation for the benefit reserves (or net premium reserves) inherits the 
same logic that was applied for the premium notation. For example, 

Ve is the benefit reserve at the moment t in the case of a fully continuous whole life 
insurance on (x); 

«Vx is the benefit reserve at an integer moment k in the case of a fully discrete whole life 
insurance; 

kVxm is the same for an n-year endowment; 


«Vi is the same for an n-year term insurance; 


kV (Axm) is the benefit reserve at a moment k in the case of an n-year endowment with 
the discrete type of premium payment and a benefit payable upon death. 

Below, we will sometimes use this notation and sometimes prefer another symbolism, 
whichever is more convenient. 


2.2 Examples of direct calculations 


EXAMPLE 1 ([151, N6]*). For a fully continuous whole life insurance on (40), you 
are given: the level annual premium is $66 payable for the first 20 years; the death ben- 
efit is $2000 for the first 20 years and $1000 thereafter; 5 = 0.06; 1000450 = 333.33; 
1000A 59.79 = 197.81; and 100010£50 = 406.57. Calculate 19V for the premium given. 


Reprinted with permission of the Casualty Actuarial Society. 


2. Reserves 525 


Since we compute the reserve needed when the insured attains the age of 50, the fact that 
now she/he is aged 40 does not matter. Assuming that the insured has attained the age of 
50, we represent the insurance as a combination of two contracts with level payments: the 
whole life insurance with a benefit of $1000, and the 10-year term insurance with the same 
benefit. The total APV of this combination is 


1000450 + 100045010] = 333.33 + 197.81 = 531.14. 


After t = 10, the insured pays only 10 years more. The APV of the premium annuity 


1 S $ 
starting at t = 10 is the premium multiplied by @59.79 = 5 (1 As9.79): Now, Aso:T0) = 


Asoag + 10Es9 = 604.38/1000. So, a59.79 = mal Asp.19) © 6.5936 . Thus, for P = 66, 
the APV of the premium annuity is 66 - 6.5936 = 435.182. 
Eventually, 10V = 531.14 — 435.182 = 98.958. 


EXAMPLE 2. (The reader who skipped the multiple decrement scheme should omit 
this example.) A special fully discrete 3-year term insurance on (50) follows a double 
decrement model, with decrement | corresponding to accidental death and decrement 2 
corresponding to all other causes. The probability distribution is specified by the following 
table: 


x lso [s1 | 52 
gs) | 0.004 | 0.004 | 0.004 
q® 0.02 | 0.03 | 0.04 


The death benefit is 3000 for accidental deaths and 1000 for the second decrement; v = 
0.97. 

Find ,;Vpen and ;V = ıVp if starting from t = 1, the company charges the premium 10% 
larger than the benefit premium: P = 1.1Phen. 

First, we compute the APVs of the benefit and premium payments starting from ¢ = 1. 
Denoting the present value of the corresponding benefit payment by Z1, we have 


E{Zi} =v (30004 +10004$ ) +7 f- (af) +45))| (300045) +1045’) ~ 88.0033. 
(2.2.1) 
Let Y; be the present value of the premium payment at unit rate, starting from t = 1. Then 


E} =1+v |1- (a$) +48? )] = 1.93702. (2.2.2) 
The total APV of the benefit payment equals 
E{Z} =v (300094) + 10004% ) +v [1 — (a$ +4% ) | E{Zi} = 114.3548, 
while for the total premium with unit rate, 
E{Y}=1+v[1- (4S) +4% ) | Ei} ~ 2.7361. 
The benefit premium 
Poen = ara = 41.7945. 


526 10. PREMIUMS AND RESERVES 


The premium with the 10% loading is P = 1.1 Phen ~ 45.9740. 

Thus, 1 Vben = E{Z1 } — Phen E{Y1 } ~ 88.003 — 41.7945 - 1.93702 ~ 7.0464, while 1 Vp = 
E{Z,} — P - E{Y; } ~ 88.003 — 45.9740 - 1.93702 ~ —1.04955748. So, the reserve is neg- 
ative, and for the insured it is reasonable not to renew the contract at time t = 1, if it is 
possible. 


2.3 Formulas for some standard types of insurance 


In the “consistency case” described in Section 1.2.2, we can proceed as we did when 
deriving (2.1.3). Consider, for example, the fully discrete n-year endowment insurance. In 
this case, Phen = Ax:m/äxm, and 


kVben = Ay end — Poendys cn R = Ay Kael Axm (2.3.1) 
On the other hand, 1 = Ax:m + däxm for any x and n, which implies that A, .5-g = 1 — 
dä, SKAH and Aq = 1 — däx:m. After substitution into (2.3.1), some like terms cancel, and 
following the traditional notation, we can write that 


Ökna 
kVx:m =1 a (2.3.2) 
x: 


To consider the fully discrete whole life insurance, it suffices to let n — œ, which leads 
to E 
dick 

de 


kV = 1- (2.3.3) 
We have already obtained the similar formula for the fully continuous case in (2.1.3). In 
the traditional notation, we write it as 


Vota’ (2.3.4) 


ax 
The counterpart of (2.3.2) is derived absolutely in the same manner, and it looks as 
follows: 


Vag =1- — (2.3.5) 
X:N] 


EXAMPLE 1 ([159, N15]*). You are given: An individual life aged 40 purchases a fully 
discrete whole life insurance policy with a death benefit of $10,000; i=0.05; pao = 
0.98; d49 = 12; premiums have been calculated according to the equivalence principle. 
Calculate the benefit reserve at time ¢ = 1. 

We have dap = 1 +v p40ä41, and v = =e Hence, 


12-1 
eee a TTS, 
(1/1.05) -0.98 


Reprinted with permission of the Casualty Actuarial Society. 


2. Reserves 527 


Then, by (2.3.3), the benefit reserve 


~ 10,000 x (1 = y ~ 178.6. 


EXAMPLE 2. Consider the fully continuous life insurance with the force of mortality 
u(t) =u. 

(a) In the case of the whole life insurance 4, = 1/(u+ 5), and from (2.3.4) it follows 
that ,V, = 0 for all ż. This is not surprising in view of the memoryless property and, as a 
consequence, of the fact that the benefit premium P, = u and does not depend on x. [See 
Examples 9.1.1 and 1.2.1-1.] 

(b) We should not, however, expect the same for the n-year endowment insurance. For 
example, if t is close to n, the premium remaining to be paid is small (since the period [z,n] 
is short), while the company still has an obligation to pay a unit of money, not depending 
on whether or not the insured survives n years after the time of policy issue. 

Using, for example, (9.3.2.4), we have 


n 1 
Pes ce —or =H dt = A —(u+8)n 
axm I ee Ta 5l e J; 


and hence, by (2.3.5), 


= L—e 4 8)(n-t) out 8) _ 
Vem =1 1 —e (ut 8)n E e(utd)n _ 1° 


As expected, when ¢ runs from 0 to n, the last expression is strictly increasing from 0 (the 
reserve at the initial time) to 1 (the reserve before the final payment of one). 


2.4 Recursive relations 


Let us consider a general fully discrete insurance on (x). We denote by P, the premium 
paid at a time moment k = 0, 1,... , and cx, denotes the benefit paid at the end of the period 
[k,k + 1] if the contract terminates in this period. All other notations are standard. Assume 
that the insured has attained the age of x +k. As in Section 2.1, set E; = {T (x) > k}. 

We use the first step approach, considering two cases: when the insured lives at least 
one year more, and when she/he does not. The corresponding formula for total expectation 
looks as follows: 


WV =E{,L| Ey} =EGL|T (x) >k +1, Ey }P(T (x) >k+1| Ey} 
+E{,L|T (x) <k+1, Ex}P(T (x) Sk +1] &} 
= E{,L|T(x+k) > I} px. +E{L|0 < T(x +k) < 1}qx+k- 


If T(x+k) > 1, then there will be no benefit payment at the moment k + 1, and before the 
receipt of the next premium P+1, the future loss att = k+ 1 equals ,,,L. The discounted 
value of this loss is v-,,; L. Taking into account the premium P, paid at the time t = k, we 
have 

E{,L| T(x+k) > 1} = E{v “kyl — Pr | T (x +k) > 1} =v: ya V — Pr. 


528 10. PREMIUMS AND RESERVES 


If T(x+k) < 1 (the strict equality may be not taken into consideration), then there will 
be a payment of cx; at the moment k + 1, and the contract will be terminated. Hence, 


E{,L| T(x+k) < 1} = VCk41 — Pr. 
Thus, ¿V = (v -k41 L — Py) perk + (VCk+1 — Pk)qx+k, Which may be written as 


KV + Pk = V( Cee Geek + k+1V + Pr+k)- (2.4.1) 


Proceeding from (2.4.1), one can use the forward and backward recursion. If the pre- 
miums are determined by the equivalence principle, we set ọV = 0 and may move forward 
starting from ¢t = 0. If the contract can last at most n years, we can first calculate „V and 
move backward starting from t =n. 


EXAMPLE 1. Let p, = 0.95, the benefit payment at the first year (if any) be 10,000, the 
premiums (which may vary) correspond to the equivalence principle, and Po = 100. Find 
the reserve at the beginning of the second year after the time of policy issue for v = 0.97. 

The problem is simple. Since we proceed from the equivalence principle, Vo = 0, and 
from (2.4.1) it follows that 


0+ 100 = 0.97(1000- 0.05 + iV - 0.95). 


Hence, 1V ~ 55.89. 


EXAMPLE 2 ([152, N3]>). For a fully discrete 10-year deferred whole life insurance 
of $1000 on (40), you are given: v = 0.95, p4g = 0.98077, p49 = 0.98039, Aso = 0.35076. 
The annual benefit premium of $23.4 is payable during the deferral period. Calculate gV, 
the benefit reserve at time t = 8 right before the premium payment. 

Since there is no premium payment after t = 10, the reserve 1ọV = 1000As59. Since the 
company pays nothing if the insured dies within the interval [49,50), the payment cio = 0, 
and in accordance with (2.4.1), 


9V = —P +v: 10V - pag = —23.4 + 0.95 - 0.98039 - 1000 - 0.35076 ~ 303.28752. 


Similarly, 


gV = —P +v: oV-: pag = —23.4 + 0.95 - 0.98077 - 303.2875166 ~ 259.18253. 
Substituting px+k = 1 — qx+k, we may rewrite (2.4.1) as 
KV + Pe = vhk V + (cep — k1 V axr). (2.4.2) 


The amount cz+1 — g+1V is called a net amount at risk in the period [k,k + 1]. 


EXAMPLE 3. (Similar examples are contained, for instance, in [19, Example 8.3.2], 
[154, N30], [154, N10].) A special fully discrete n-year term insurance on (x) pays a death 
benefit of one unit plus the benefit reserve at the end of the year of death, provided that the 


SReprinted with permission of the Casualty Actuarial Society. 


2. Reserves 529 


insured dies before the time moment x+n. Given the probabilities qx and a discount v, find 
the level benefit premium. 

From the conditions of the problem it follows that „V = 0, and cy; = 441V +1 for 
k=0,....n—1if K(x) <n. 

Then the formula (2.4.2) implies that ¿V + P = v(k41V +9r+4) fork = 0,...,n— 1, or 


EV = =P Hv- V Fd (2.4.3) 


Since we are looking for the benefit premium, ọV should be equal to zero, so in this problem 
we may either move forward starting from oV = 0, or move backward starting from „V = 0. 
Let us choose the former way. Applying (2.4.3) consecutively in each step, we have 


0 = oV = —P +v: 1V +g, = —P+v(—P +v- V + vgr41) + Gx 
-P —vP +v- V +vqr+1 + Gx 

P—vP v( P4 v- 3V + 0gx42) +V Geri tx 
= —P—yP—y’P+y3V4 vqx +V qxe Hvar = e 


n—1 


= —P -vP —...—v Hy”. aV +V guint +. Hvar HV = 
= —P(1 +... tyr!) ne, ee +- a ee 7 


because „V = 0. Solving it for P, we get that 


det Maer + +v” ay n-1 


pa 
1+v+... +v"! 


(2.4.4) 


Certainly, we can write that the denominator equals (1 — v”) /(1 — v), but (2.4.4) reflects the 
logic of the answer: if in the numerator, we replace all qg’s by one, then we will come to the 
denominator. Consider two particular cases. 


(a) Let the mortality force u(x) = u. Then all qy = q =1—e™, and (2.4.4) leads to 
P = vq. In this case, from (2.4.3) it follows that ¿V = —vq+v-441 V + vg = v ‘k41 V, and 
because „V = 0, we have ¿V = 0 for all k =0,...,n. We could predict this proceeding from 
the memoryless property. 


(b) Let n = 2. We know that pV = 0 and 2V = 0. Let us find ;V. From (2.4.4) it follows 
that 


P= yd Vat 
l+v 
Substituting it into (2.4.3) we have 
+v 
1V = —P + vqx+1 = ye + Vqx+1 
+v 
ay, SH Ax 
l+v ` 


As was already discussed, if iV > 0, the insured has an interest to renew the insurance. 
We see that this is the case if and only if gx+1 — qx > 0. The former condition is equivalent 
to the growth of the mortality force. In reality, the last condition is true at least for large 
and moderate x’s. 


530 


10. PREMIUMS AND RESERVES 


3 EXERCISES 


Section I 


10. 


11. 
12. 


. In the situation of Section 1.2.1, show that if the benefit provided by an insurance contract 


increases by k percent, then the benefit premium increases in the same proportion, no matter 
whether the benefits are level or varying. 


. Write the traditional notation for all premiums considered in Table 1.2.1-1. 


. Consider the benefit premiums in the fully discrete case for (i) an n-year term insurance; 


(ii) an n-year pure endowment; (ii) the n-year endowment. Which premium is the largest? 
Write the relation between these three premiums. Clarify heuristically when the premium in 
the case (ii) is larger than in the case (i), and vice versa. Consider the same problem for the 
fully continuous case. 


. In each of the following, tell which quantity is larger: (i) P, or Py; (ii) Pum or Pem, (iii) Py or 


Pm, GV) Phi or Pym 


. Without any calculations, just proceeding from common sense, write the limits for all premi- 


ums in Table 1.2.1-1 as 6 — 0. Clarify how to justify your answers rigorously. 


. As was told in Section 8.2.4.1, the traditional notation for the APV of the pure endowment 


insurance is A , or „Ex. The benefit premium is denoted by P ,. 
xn xn 


(a) Clarify why the formula P,.q = Pla +P , is obvious from an economic point of view. 
xn| 
Prove it by using Table 1.2.1-1. 
(b) Show that „P; = pi +P Ax+n. (For the definition of „Py, see the remarks on notation 
X:N] 


in Section 1.2.1.) 


. A insurance company adds a 10% security loading to net premiums. You know that the 


company charges 20-year old clients the following annual premiums per unit benefit: (a) 
0.044 for a 30-year payment whole life insurance; (b) 0.011 for a 30-year term insurance; (c) 
0.055 for the 30-year endowment. Find Aso. 


. Consider two clients such that for the same 8’s, the APV of the whole life annuity for the first 


client is larger than that for the second. Who should pay more for the whole life insurance? 


. A 50-year-old client buys a whole life insurance with unit benefit payable at the end of the 


year of death and with premiums paid at the beginning of each year. (a) Using the Illustrative 
Table, find the benefit premium. (b) Assume that, when calculating the actual premium, the 
actuary of the company adds 5% to the benefit premium. Estimate the probability that the 
company will make a profit dealing with 100 independent clients of the above type. 


Using the Illustrative Table, find the benefit premiums in the case of 50-year old clients for 
the 30-year term life insurance. 


Solve the problem of Example 1.2.1-3 for the case when u(x) = 0.01. 


Solve the problem of Example 1.2.1-4 for the case when both independent lifetimes are 
uniformly distributed on [0,50]. (Advice: Integrals to which you will come are standard, but 
computing them takes time. So, it is reasonable to use software.) 


13. 


14. 


15. 


16. 


17. 


18. 


19. 
20. 


21. 


22. 


23. 


3. Exercises 531 


Find the formula for the variance of the loss in the case of a whole life insurance with fully 
continuous premiums and the lifetime uniformly distributed on [0,@]. (Advice: Look up 
Example 8.1.1-1. Clearly, the answer should involve the age x and 8.) 


Find concrete answers for the problem from Example 1.2.5-1a for 6 = 0 in terms of ex and 
nPx- 


In the situation of Example 1.4.2-2, for the case of lifetimes uniformly distributed on [0,50] 
and y = 0.05, find the number n of contracts for which the security loading coefficient (with 
respect to the benefit premium) is not larger than 5%. 


Similarly to what we did in Example 1.5-1, write an equation for the exponential premium in 
the case of uniformly distributed T (x). Find an asymptotic approximation for the premium 
for large C. 


One hundred 50-year-old clients buy a whole life insurance with a death benefit of $10,000 
payable at the end of the year of death and with premiums paid at the beginning of each year. 
Assume that, when calculating the actual premium, the actuary of the company adds k% to 
the benefit premium. Using the Illustrative Table for 5 = 4%, find k for which the probability 
that the company will make a profit is greater than 0.95. 


Consider two groups of clients of the same age x. In each group, the distribution of the future 
lifetime is the same. However, if T) (x) and T® (x) are the lifetimes for typical clients from 
the first and second group, respectively, then P(T® (x) > t) > P(T® (x) > t) for ALL t. 
Which group is healthier? For each characteristic below, figure out for which group it will be 
larger: Ax, dx, Ax-m, äxm, Px, Pxm- 


Write an explicit formula for P, in the exponential case. 


For a rough estimate of the premium on the fully continuous basis for a whole life insurance 
contract, an actuary took as a force of mortality u = 0.02 and the interest rate 5 = 0.06 for 
the benefit payment. The actuary proceeded from the equivalence principle. However, when 
computing the APV of the premium annuity, the actuary used the quantity kô as an interest 
rate, where k is a coefficient. (Look up the remark at the end of Section 1.2.5.) Should k be 
larger or smaller than one, in order to incorporate a safety loading into the premium? Find 
the premium and graph it as a function of k. Find k for which the premium is 10% larger than 
the net premium. 


Estimate P (A3033) in the situation of Exercise 8.20. (Hint: One should be cautious when 
applying (8.2.1.7) to term insurances.) 


On her thirtieth birthday, Mary decided to enter into a pension plan paying $50,000 at the 
beginning of each year, starting from Mary’s 65-year birthday. (a) Using the Illustrative Table 
for 6 = 4%, find the premium Mary should pay if the plan adds 5% to the net premium. (b)* 
Recalculate the premium and benefits for the case when both are paid monthly. Clarify why 
the premium turned out to be smaller. 


(a) Consider a fully continuous whole life insurance on a lifetime T. Denote by m the 
(level) premium rate, by 6 the interest rate, by L the random loss of the company, and by 
I(t) the loss of the company given T =t. Show that I(t) =e (1+ %) — §- Graph L(t) 
and show that P(L > 0) = P(T < to), where to is a number such that /(t9) = 0. Show that 
T = 8 /(exp{&to} — 1) and for P(L > 0) = P(T < to) < y, we should choose 


Sexp{—dqy} ò 
~ 1—exp{—ôqy} exp{ðqy} -1 


where qy is the y-quantile of T. 


(3.1) 


532 


24.* 


25. 


26.* 


27. 


28. 
29. 
30. 
31. 


32. 


33. 


10. PREMIUMS AND RESERVES 
(b) Show that the counterpart of (3.1) in the fully discrete time case is 
T > dv /(1—v®t!), (3.2) 


where qy is the y-quantile of K, and v is the discount factor. 


(c) Using the Illustrative Table, estimate the percentile premium for the whole life insurance 
on (50) for y= 0.05. Compare it with the benefit premium computed in Exercise 9. 


Consider the situation of Example 1.2.4-1 and explain, from a heuristic point of view, why 
the insured should pay more if the premium payments are provided monthly. Why is the 
coefficient k in (1.2.15) large for G, = 1? 


Explain from a common sense point of view that the premium for the plan in Example 1.2.5- 
1b must be larger that the premium for the usual n-year deferred annuity plan. Show that it 
follows from (1.2.16). 


By analogy with what we did in Section 1.3, derive the formula for the quantity Sy:m, the 
actuarial accumulated value at the end of the term of an n-year temporary annuity in the case 
of continuous time. 


Section 2* 


Write a general formula for the benefit reserves for an n-year term insurance in the fully 
continuous case. Write a precise expression for a constant mortality rate. Interpret the last 
result. 


Solve the problem of Example 2.3-2b for the fully discrete case. 
Without any calculations, simplify the formulas of Section 2.3 for the case ò = 0. 
Write and analyze the formula for the benefit reserves for deferred annuities. 
Consider a life (x) with qx = 0.02, qx+1 = 0.03, and qx+2 = 0.04. Let v = 0.97. 
(a) Find the level benefit premium and the benefit reserves for a special 3-year term insur- 


ance with payments cı = 10, c2 = 20, and c3 = 15. 


(b) Do the same for a 3-year endowment insurance with the death benefits as above and 
with a payment of 20 at the beginning of the 4” year if the insured attains age x + 3. 


(c) Do the same for a 3-year term insurance paying a death benefit of one unit plus half of 
the benefit reserve at the end of the year of death, provided that the insured dies within 
three years. 


Using a spreadsheet technique, find the benefit premium and the benefit reserves for the 
insurance from Example 8.4.1-3. 


Let r = 1 +i where i is an annual interest. Consider an n-year deferred whole life annuity on 
(x) with the provision for a death benefit equal to the benefit reserve, payable at the end of 
the year of death. Show that the benefit reserve 


re—-1 
WV = mg ote k<n. 


Chapter 11 


Pension Plans 


This chapter concerns pension models. We have already considered the simplest one in Sec- 
tion 10.1.2.1 (see, in particular, the last position in Table 1 there), where the level deferred 
annuity may be viewed as future pension payments, and the temporal premium annuity as 
contributions to a retirement account. In this case, the benefit premium is a net contribution 
rate of the participant’s payments for having a retirement annuity in the future, provided 
that the participant will survive the retirement age. As we saw in Table 1 mentioned, for 
example in the discrete time case, the net contribution rate (premium) per one unit of the 
future pension rate equals 


nj4x  nPx'’V ` Axtn 
Pas = ’ 


axn ayn 


where x is the participant’s age at the moment of entering the pension plan and n is the time 
to retirement. 

In reality, the situation is much more complicated for many reasons. In particular, the 
future pension as well as the contributions to a retirement account may depend on the 
(changing in time) salary of the future retiree; the pension may depend on the age of the 
participant and may change in time due to inflation. This and other features will be reflected 
in models below. 

When an actuary is analyzing a pension plan, her/his main task is to check the plan’s sol- 
vency and provide conditions ensuring a certain balance between future benefit payments 
and present contributions to a retirement account. Due to the law of large numbers, such a 
balance may be achieved more easily if individual pension plans do not run separately but 
rather are parts of a common fund including many participants. The corresponding models 
in this case are more developed and are also considered below. 


1 VALUATION OF INDIVIDUAL PENSION PLANS 


We call a pension plan any arrangement of regular payments for life starting at a certain 
age. We use the term individual plan if we deal with one future retiree (usually, a participant 
of a plan), which we do in this section. 

Actuaries distinguish two broad categories of pension plans: defined benefit (DB) and 
defined contribution (DC) plans. 

In a DB plan, we start with a definition of the future (or projected) benefit; that is, what 
a worker (alone or with his/her spouse) can expect upon retirement. 


533 


534 11. PENSIONS PLANS 


A DC plan specifies a fixed contribution from a worker and/or his/her employer to a 
retirement account. In this case, the benefit is an annuity that can be purchased at the mo- 
ment of retirement by the accumulated contributions made during the entire pre-retirement 
period. 


1.1 DB plans 
1.1.1 The APV of future benefits 


Consider a participant age x with h years in service. Denote by r the minimal retirement 
age, and by ya projected time until the future retirement. In other words, the projected age 
at which the participant will retire is x+ y, and we assume x +y > r. 

Let B = B(x,h,y) be the corresponding projected annual pension rate. It is worth em- 
phasizing that in the current model the function B(x,h,y) is predetermined. Note also that 
since r is the minimal retirement age, B(x,h,y) = 0 for y <r—x. 


EXAMPLE 1. Suppose that the minimal retirement age is r = 66, a participant age 
x = 33 was hired 3 years ago, and her/his current annual salary is wọ = $50,000. Suppose 
the participant’s salary grows at an annual rate of 2%. For simplicity, assume that this 
growth is running continuously; so, at age x+y, the participant’s salary will equal woe”. 

Assume also that the annual pension will amount to 2.5% of the annual salary at the 
moment of retirement multiplied by the number of years in service at the time of retirement. 
Then, for y < 66 — 33 = 33 the rate B(x,h,y) = 0, and for y > 33, 


B(x,h,y) = 0.025 (h +y) woe?” = 0.025 (3 + y) 50, 000e??? = 1250 (3 + y) e”. 

(1.1.1) 

Later, we consider other examples of the benefit rate function B(x, h, y), but first let us 
find the APV of the future benefits given a function B(-). 

There are four reasons for which a participant may leave the pension plan: withdrawal, 

disability, death, and retirement. We unify the first three decrements and denote by ws?) (y) 


the hazard rate corresponding to these decrements. Denote by ul? (y) the hazard rate corre- 


sponding to the last factor: retirement itself, and set ul? GO) = us (y)+ us? (y). 

(The multiple decrement scheme is presented in detail in Section 7.2, but it is enough for 
the reader to look over the scheme at the end of Section 7.1.1. The only difference is that 
here we are dealing with the remaining life time.) 

We assume the factors above to be acting independently. In particular, this means that 


the probability that the participant will not leave the plan before age x+ y is 


M 
Pa = exp \-[ us? (as : 


and the probability that the participant will retire and this will happen in the time interval 
[y,y + dy] is equal to 

„pË us” (y)dy, (1.1.2) 
provided y > r— x. 


Certainly, for y < r — x, the probability under discussion vanishes, and we set us? (y) =0 
forO<y<r-x. 


1. Valuation of Individual Pension Plans 535 


The expression (1.1.2) needs to be clarified. Formally, it means that if a participant dies 
after the retirement age but before actually retiring, then the pension will not be paid. In 
other words, the participant does not have a beneficiary of her/his pension benefits. 

To model a situation with a beneficiary, we should just interpret the parameters above in a 
different way. For example, we may define u”) (y), for y > r, as a hazard rate corresponding 
to two possible decrements: retirement or death, and presuppose that once one of these 
decrements occurs, the pension payments will begin. Also, in this case, for y > r, the 
hazard rate corresponding to withdrawal may be set to be zero. 

Let us denote by 6 the risk-free interest rate, and by as” the expected present value of the 


pension annuity with a unit rate and starting at age z. (So, we regard as” as an APV from 
the standpoint of time z.) 

In the no-beneficiary case, we may set as” =G,, or dz, or asl?) (monthly payments), 
depending on which pension annuity model we adopt. 

If the participant has a beneficiary, in the case where the participant dies before retire- 
ment, we may view as” as the lump-sum paid to the beneficiary upon death of the partici- 
pant. 

In the model where the beneficiary is a spouse who will be receiving the pension if the 
participant dies, as” should involve survival probabilities for the spouse and be calculated 
differently depending on whether the participant or the spouse receives the pension. (We 
considered this case in Section 9.6.2.) 


Proceeding from (1.1.2), we get that the APV of the pension benefits is equal to 


[eB Ayal pl? WP ay. (1.1.3) 
r—x 

EXAMPLE 2. Let us revisit Example 1. Assume that the hazard rate u® (y) may be 
well approximated by a constant u = 0.006. Since x = 33, and the minimal retirement age 
r = 66, the probability that the participant will still be in the plan at the moment when 
retirement becomes possible, is exp{—0.006 - 33} ~ 0.82. 

Next, we assume that the choice of the retirement age by the participant corresponds to 
the uniform distribution on the interval [66,70]. Then, for y € [33,37], the probability in 
(1.1.2) equals 


y ‘ . 7 
apf- [uy +48 oas} uP (y)dy = apf- [uw (as) 
; y ata 2 1 
x Poel- f n s)as} dy=e Me aks ay, 
0 


(Since us? (y) corresponds to a uniform distribution, the factor in the brackets |] above is 


just 4, the density of the distribution uniform on [33, 37].) 
Suppose the free-interest rate 6 = 0.04. Then, in accordance with (1.1.3), the APV of the 
retirement annuity is 


37 1 37 
f e700 1250(3 +y) 2 al, gano dy = 312.5 f e0023 +y) af) dy. 
33 33 


y 


536 11. PENSIONS PLANS 


It remains to choose an appropriate function a’) p it is 
noteworthy that it depends only on the survival probabilities 
for the population to which the participant (and the benefi- 

. ciary if we consider the corresponding model) belong. Let 
us restrict ourselves to the no-beneficiary case and set as” = 
a,, which may be considered an approximation for the APV 
* with monthly payments. Let us proceed from the data from 


the Illustrative Table in Appendix, Section 3 and use the 


tt tt 


FIGURE 1. 3 


formulas a, = 5 (1 — Ax) and Ay = liy The values of 
A, are given in the table. Calculations lead to the values 11.76,11.43,11.10, 10.74, 10.39 
for @. where x = 66,...,70 respectively; the corresponding plot is given in Fig.1. It looks 
practically linear, which allows us to accept—in this study example—the linear approxi- 
mation 


70 — 
dy = 10.39 +1.37 — * for x € [66,70]. 


Thus, the desired APV equals 


37 0.026 37—y 
312.5 [ e0083 4 y) ( 10.39 + 1.37 ) dy ~ 211,652. 
33 


1.1.2 More examples of the benefit rate function B(x,h,y) 


Most DB retirement plans involve the following three variables. 


e The salary of the participant. It may be the salary at the moment of retirement or an 
average salary over the working period, computed with specified weights. Salaries 
corresponding to periods closer to the retirement are typically weighted more heavily. 
In the case of benefits not depending on salaries at all, one may proceed from a fixed 
“universal” salary equal to, say, a unit of money. 


¢ The accrual factor that specifies which part of the salary contributes to the pension. 
¢ The number of years in service. 


Quite often, these three characteristics are multiplied. For instance, in Example 1.1.1-1, 
we multiplied the salary at the moment of retirement by the accrual factor k = 0.025, and 
the amount obtained was multiplied by the number of years in service at the moment of 
retirement. 

The following is also worth emphasizing. Because we are talking about future benefits— 
and hence, about future salaries, we are dealing with projected (or estimated) salaries. 
Therefore, the function B(x,h,y) represents projected (or estimated) retirement benefits. 

Consider again a participant age x with h years in service. Denote by w) (x) the actual 
annual salary rate at age x, and by w"°) (x,y) the estimated annual salary rate for the same 
participant y years later. (Another notation uses the symbols (AS), and (ES)x+y, respec- 
tively; see, e.g., [19].) 

One of the ways to estimate a future salary consists in introducing a salary scale function 
Sx reflecting salary increases that are due to merit, seniority, inflation, and other factors. 


1. Valuation of Individual Pension Plans 537 


One of possible examples is the function Sy y = eB) Sx y, Where the first factor reflects the 
salary growth due to inflation (at a rate of B), and sxy is the factor representing the growth 
due to individual merit increases. The relation between the estimated and actual salary is 
given by the formula 


Sy 
w (x,y) = w (x) So (1.1.4) 


To simplify examples and exercises below, we adopt the presentation 
Sxy = Sxty; (1.1.5) 


where s; is a scale function depending only on age. 
Next, we consider several examples of the function B(x,h,y). The first two are similar to 
Example 1.1.1-1. 


1. The future pension rate is a fraction kı of the final salary rate, i.e., 


S. , 
B(x,h,y) = kiw® (x,y) = kiw® (x) = = kiw® (x) ae (1.1.6) 
Sx Sx 
2. Now, we include the number of years in service. The simplest way is to multiply 
the previous pension rate by the number of years in service, but certainly, in this case, the 
accrual factor may differ from kı. So, using the symbol k2, we write 


B(x,h,y) = ko (h+y)w® (x,y) = ko(h ty)w (x) a (1.1.7) 
x 
3. The next natural step is to replace the estimated salary at the moment of retirement 
by an average salary. It is quite common to replace w) (x,y) in (1.1.6) and/or (1.1.7) by 
the average of estimated salaries over the last m years prior to the projected retirement. For 
certainty, consider the case m = 5, which is common in practice. There are several ways to 
compute the average. First, we may replace we) (x,y) by 


O) (x) Sxty F Sx+y—1 + =z + Sx+y—3 F Sxty—4 ; (1.1.8) 
Sx 


In a modified version, the last expression could be replaced by 


wl (x) Ea t Seth] nee 3 F St pla FOS Set h]-5 
Sx 


(1.1.9) 
where [y] is the integer part of y. 

The motivation behind (1.1.9) is based on the assumption that retirement may occur at 
an intermediate moment of a year. So, instead of the five last years, we take the six years, 
but assign the weight 0.5 to the “end-point” years in the time interval considered. 

Certainly, if y < 5, for the years that have already passed, estimated salaries should be 
replaced by actual salaries. 


4. There is a version of the DB plan dealing with the average salary over the entire career. 
In this case the benefit is called a career average benefit. 


538 11. PENSIONS PLANS 


5. Let us generalize the above definitions of average salary. To simplify the formula 
below, denote by w(z) the salary of the participant at age z keeping in mind but suppressing 
in notation two cases. If z is less than or equal to the current age x, the symbol w(z) denotes 
the actual salary, while if z > x, then w(z) denotes the projected salary. 

Recall that the number of years in service at age x was denoted by h, and for a given y, 
set K = h + y (the projected total length of the in-service period). Then the average salary 
may be defined as 


K 
Waverage (X,Y) =f q(s)w(x—h+s)ds, (1.1.10) 


where q(s) is a weighting function. Since it is natural to place larger weights to years that 
are closer to the moment of retirement, we assume that q(s) is a non-decreasing function. 
One of interesting particular examples is given in [94]: 


q(s) = ye), (1.1.11) 


where the parameter y > 0. 
In this case, we assign the weight ye~“* to the first salary earned (i.e., w(x — h), usually, 
the lowest salary), and the weight y to the last salary (i.e., w(x + y), usually, the highest). 
To understand (1.1.11), assume first that the salary is flat: w(s) equals some number wo. 
Then, by (1.1.10), 


K 
Waverage (X,Y) = Wo | yers-*) ds = wọ (1 a eV), (1.1.12) 


which is close to the salary itself for large K. 

Note also that the larger the parameter y, the smaller the weights assigned to salaries 
early in the service term. 

Suppose now that the salary is growing exponentially at a rate of B starting with a salary 
of wo at the age x— h. In other words, w(x — h +s) = wo ebs, Substituting this and the 
formula for q(s) from (1.1.11) into (1.1.10), we obtain that in this case, 


Waverage (X,Y) =w (ePK — e75), (1.1.13) 


N 
y+B 
Certainly, if B = 0 (the salary is flat), we come to (1.1.12). 


Some particular examples may be found in Exercises 1-3. 


1.2 DC plans 

As was mentioned in the beginning of this section, for DC plans, we proceed from a 
given contribution rate. 
1.2.1 Calculations at the time of retirement 


For simplicity, we consider a continuous-time model. Let [0, K] be a time interval. Con- 
sider a participant who entered a pension plan at time t = 0 and retires at time t = K. All 
evaluations will be provided from the standpoint of the retirement time. 


1. Valuation of Individual Pension Plans 539 


Let w(t) be the salary rate at time t < K; that is, w(t)dr is the salary received during the 
interval [t,t + dt]. Since the present time is the time of retirement, w(t) is known. Let c(t) 
be the contribution rate per unit of salary; that is, during a period [t,t + dt] the participant 
contributed the amount c(t)w(t)dt to the retirement account. 

We adopt a model for which the contribution provided at time ¢ is continuously increasing 
at a rate p(t) due to investment. So, the contribution c(t)w(t)dt increases to eO- c(t) x 
w(t) dt by time K. Consequently, the total retirement capital at the moment of retirement is 
equal to 


K 
C= f c(t)w(t) POK at. (1.2.1) 
0 


The participant is free to use this capital as she/he wishes. If the participant chooses to 
purchase a level life annuity with an APV of a”, then the net pension annual rate equals 
C 1 K 
E p(t)(K—t) 
n= == [ c(t)w(t)e dt. (1.2.2) 
EXAMPLE 1. Consider an employee who is an active member of a pension plan, and 
assume that the contribution rate is constant: c(t) = c, as is the investment rate: p(t) = p. 
Assume also that the salary is growing at a constant rate B, that is, w(t) = woe. Suppose 
p > B. Then the accumulated capital equals 


pK _ BK 


ee*—e 
p-B ` 
Suppose the participant entered the plan at age 30 and retires at age 66; so, K = 36. Suppose 
further that the salary growth rate is B = 0.02; the investment rate is p(t) = 0.08; the 
employee was contributing to the retirement account at a rate of 3.5% and the employer 
was contributing for the participant at the same rate. That is, the total contribution rate is 
c(t) = 0.07. Then, by (1.2.3), it is easy to compute that the total accumulated capital per 
$1 of the initial salary wọ is equal to ~ 18.38. 
As for a), let us take Goo and adopt the value 11.76 obtained in Example 1.1.1-2. Then, 
the pension rate per $1 of the initial salary is 
_ 18.38 


T= in76 © 156 


For example, if the participant started at age 30 with a salary of $40,000, then his (net) 
annual pension rate will amount to ~ $62,400 (while his annual salary at the moment of 
retirement is equal to $40,000 e° °° ~ $82, 177). 

The reader has probably noticed that while we set the investment rate p = 0.08, the value 
of Go has been calculated proceeding from a risk-free rate of 6 = 0.04 (since we used 
the Illustrative Table). This is not meaningless: these characteristics are different, and the 
former is greater than the latter. Of course, the larger the investment rate is, the larger the 
pension. On the other hand, aggressive investment (hoping for a high rate) is risky. 

In the model of this subsection, calculations are provided from the standpoint of the 
retirement time, so we do know what the investment rate was. It is easy to calculate that 
for p = 0.06, the pension rate would be $39,385, and if p = 0.04 (the risk free rate), the 
pension decreases to $25,788. 


K 
C= cwo | eP eP(K—1) dt = cwo (1.2.3) 
0 


540 11. PENSIONS PLANS 


1.2.2 Calculations at the time of entering a plan 


We will use again a continuous time model and restrict ourselves to a particular scheme. 

Suppose that at time t = 0, a participant age x enters a pension plan, and she/he is plan- 
ning to retire at time K. Now, we compute the APV of the future contributions to the plan, 
and all evaluations will be provided from the standpoint of the initial time, i.e., t = 0. 

Unlike what we did in Section 1.2.1, we will not include in our calculations the possibility 
of further growth of future contributions; for example, the contributions may go to a saving 
account. 

Assume also that the plan includes a provision of returning a fraction d of the accu- 
mulated contributions with interest to the participant or her/his beneficiary in the case the 
participant leaves the plan before time K regardless of the reason. 

Let w(t) be the (projected) salary at time t € [0,K], and c(t) be the (projected) contribu- 
tion rate per unit of salary. 

Denote by ;p, the probability that the participant will not leave the plan before time t € 
[0, K]. In accordance with the general formula (9.1.1.7), without the returning-contribution- 
provision, the APV of the future contributions would be equal to the quantity 


K 
C= foe Meltyw(t)- waa 
0 


where 6 is a risk-free rate (rather than an investment rate p we used above). 

To compute the APV of the return of contributions, we reason in the following way. If 
the plan returns the fraction d of the contributions in any case, such an APV would be 
exactly equal to d - Cı. As a matter of fact, the plan does not return the contributions if the 
participant is still a member of the plan at time K, which occurs with probability xp, . 

Hence, the APV of the part of the contribution that the plan does not return is d-Co, 
where 


K 
C2 = xp: f e™e(t)w(t)dt. 
0 
Consequently, the APV of the contributions equals the quantity 


C=C, —d(C1 — C2) = (1 — d)C1 +d) 
K 


K 
=(1 -a) f e™c(t)w(t)- ;pydt +d- xp: f e™c(t)w(t)dt. (1.2.4) 
0 0 


The next step may be either to find a contribution rate for which C is equal to the APV 
of a desired pension annuity or to figure out which annuity will be affordable for a given 
contribution rate. 


EXAMPLE 1. For illustrative purposes, consider the “entirely exponential” case: w(t) = 
w(0) e®", and ;py = e“’, where u is a leaving-hazard-rate. Let also c(t) = c, and & > B. 
Then 


K K 
C=cwo fa-a f e Č-Btdidt td. ek f etras) 
0 0 


= cWwo sas (1 — ERRE + er (1 — e-PK) } . (1.2.5) 


2. Pension Funding 541 


As in Example 1.2.1-1, set K = 36, B = 0.02, c = 0.07. Let 6 = 0.04, u = 0.015, and 
d = 0.9. Substituting all of this into (1.2.5), we get 


C & 1.085 wo. (1.2.6) 


Suppose that the retirement age is 66, and the participant enters the plan at age 30. As 
above, we use as an approximation the continuous-time APV des = 11.76 (see Example 
1.1.1-2). 

Next, we should recollect that we are dealing with a 36-year deferred annuity. If for 
the survival probability we accept the representation xp, = e-“* and for the discount—the 
quantity e~®, then, the APV of the future pension annuity is 


36/430 =36 pe G66 = e~ (0.015+0.04)36 -11.76 = 1.624. 


Combining this with (1.2.6), we conclude that per one dollar of the initial salary wo, the 


(net) pension annual rate is es ~ 0.668. Thus, for each one dollar of the initial salary, 


the participant will receive ~ 66.8¢ of the annual pension paid in monthly portions (we 
consider Go as an approximation of such a payment). For instance, for wọ = $40,000, the 


participant may hope for a (net) annual pension rate of ~ $26,720. 


2 PENSION FUNDING. COST METHODS 


This section differs from the previous in two respects. First, we consider a fund with 
many participants rather than one future retiree, and our main concern is the viability of the 
fund. Secondly, the models below are dynamic, and we pay attention to the evolution of 
funds in time. The model of Subsection 2.1 is similar to that in [19, Chapter 20]. 


2.1 A dynamic DB fund model 
2.1.1 Modeling enrollment flow and pension payments 


Consider a fund in which all participants enroll at age a and retire at age r. Suppose 
that in a time interval [t,t + dt], the number of new entrants (age a) equals n(t)dt; in other 
words, n(t) is the enrolling intensity at time t. Certainly, the actual process is random, so 
we view n(t) as a mean characteristic. 

To make the formulas below less cumbersome, we assume the existence of an initial time 
t = 0 and set the function n(t) = 0 fort < 0. 

Let sa(x), x > a, be the probability that a new entrant will be still a member of the fund 
at age x. We may view sa(x) as a survival function concerning remaining life, and we will 
call it as such, although s,(x) also concerns withdrawal from the plan for reasons other than 
death. By definition, s,(a) = 1. In the notation of Chapter 7, sg(x) = x-aPa- 

Now, let us consider the mean number of participants attaining age x (or more precisely, 
ages in [x,x-+dx]) during the interval |t,t-+ dt]. These are those people who entered the 
fund x — a years ago and survived to age x. So, the mean number in question is 


n(t —(x—a))Sq(x)dt =n(t —x+a)sq(x)dt. (2.1.1) 


542 11. PENSIONS PLANS 


(More precisely, the number of entrants with age in [x,x + dx] is (n(t — (x—a))sq(x)dt)dx. 
Since we set n(t) = 0 for t < 0, the last expression vanishes for x > a +t, which is natural: 
if the process starts at time t = 0, then at time ¢ the maximal possible age for a participant 
isa+t.) 

Suppose that at time ż, all participants at age x have the same salary rate w(x,t); that is, 
w(x,t)dt is the salary earned during the interval [t,t + dt]. Thus, the mean salary depends 
only on age and current time. The latter type of dependence may reflect inflation and/or the 
common growth of productivity. A typical presentation is w(x,t) = w(x) e™, where T is a 
rate due to the second type of dependency, and w(x) reflects the dependence on age. 

Then, the total annual salary rate at time t is 


W, = [on —x+a)8q(x)w(x,t)dx. (2.1.2) 


One may say that W, dt is the total payroll payment during the interval [t,t + dr]. 

Next, we specify future pensions; so we consider a DB plan. Suppose that for a partici- 
pant who retires at time ż, the initial pension rate (that is, the rate at time f) is a fraction k 
of the salary at the moment of retirement, i.e., kw(r,t). 

Usually, the pension rate is not flat, and next we consider the pension rate m(x,t) of a 
retiree age x > r at time t. For such a retiree, the retirement occurred x — r years ago; i.e., 
at time t — (x— r) =t—x+r. We assume that the pension rate 1(x,t) is equal to the initial 
pension multiplied by an adjustment factor h(x) depending only on the age x and, naturally, 
such that h(r) = 1. Thus, 


N(x,t) = U(r,t-—x+r)h(x) = kw(r,t —x4+r)h(x). (2.1.3) 


2.1.2 Normal cost 


In this subsection, we restrict ourselves to the terminal funding case where a single con- 
tribution to the fund is made for participants who retire at time t. Our object of study is 
contribution rate, or in other terms normal cost rate, S, (other notations used include (NC), 
or TP,). More precisely, S,dt is the required contribution to the fund during the period 
[t,t + dt] to ensure—on the average since we consider net characteristics—the payment of 
future pensions for new retirees, i.e., those who retire in the interval |t,t + dr]. 

Let a” be the APV of the pension annuity with the adjustment factor h(-) per unit initial 
pension rate payable continuously starting from the retirement age r. The APV a” is 
calculated from the standpoint of the retirement time. 

The probability that a retiree will attain age x is sa(x)/Sa(r) and if the initial pension rate 
is one, then the pension rate at age x is 1- h(x) = h(x). As usual, let ô be a risk-free interest 
rate. Then, in accordance with the general formula (9.1.1.7), making the variable change 
y =x-—r in the second step, we have 


mW f -senpe ae a L sa(r +y) 
ay. fe e fe Yh(r+y) co) dy. (2.1.4) 


Let 


2. Pension Funding 543 


This is the probability that a participant will be still a member of the plan (perhaps as a 
retiree) at age z +y given that this was true at age z. For us it is worth remembering that 
for z > r, the survival function $;(y) is that of the remaining life and is based only on the 
mortality force after time z. Clearly, s,(0) = 1. 

Set also h(y) = A(r +y), and note that h(0) = 1. 

So, with this new notation we can rewrite (2.1.4) as 


aP = f" DKO) O)dy. (2.1.5) 
0 


By (2.1.1), the (mean) number of participants attaining the retirement age r in the time 
interval [t,t + dt] is n(t — r +a)sa(r)dt, and these people start to collect pensions at the 
initial rate kw(r,t). Consequently, the normal cost $, = kw(r,t)n(t —r+a)sq (r)at”. Writing 


the factors that do not depend on t first, we have 
S, = ksara” w(r,t)n(t—-r+a). (2.1.6) 


(For t < r—a, this expression vanishes, which is understandable: there are no retirees yet.) 


EXAMPLE 1 (The “exponential” case.) First, let h(x) = e%(—"), where œ is a constant 
rate. Then h(y) = e®. It is natural to assume @ < 6 (the pension growth rate is not larger 
than the risk free interest rate). Substituting into (2.1.5), we have 


a” e fe e 8-8) (y)dy. (2.1.7) 
0 


The r.-h.s. looks as a “standard” level annuity with a risk-free rate 5 — œ and depends only 
on the survival function 5;(x). Denote the r.-h.s. of (2.1.7) by a”. 

Next, suppose that n(t) is growing exponentially, and set n(t) = noe®’ for t > 0, where œ 
is the corresponding population growth rate. As above, n(t) = 0 for t < 0. 

For the salary, we adopt the presentation w(x,t) = w(x)e™ already mentioned in Section 
2.1.1. 

Thus, for t > r—a, 


S; = kw(r)sq(r)a e™nge®’t9 = (know(r)sa(r)ae“Or-) eto) — Soe TON, 
(2.1.8) 
Thus in the exponential case, the normal cost (i.e., the required contribution rate) is growing 
exponentially, and this growth is specified by the population and salary growth rate. As 
follows from (2.1.8), the coefficient So = know(r)sa( ya e20- 
Since in the case under consideration the growth of the number of new active partici- 
pants as well as the salary growth are also exponential, one may hope that with appropriate 
contributions of active participants, the fund will be viable. 


2.1.3 The benefit payment rate and the APV of future benefits 


At time f, the number of retirees ages from |x,x + dx] equals n(t — (x — a))Sa(x)dx. Since 
we are talking about retirees, x > r and these retirees retired x — r years ago. 


544 11. PENSIONS PLANS 


In accordance with (2.1.3), the initial pension rate for these people is kw(r,t —x +r), and 
the current pension rate is 1(x,t) = kw(r,t — x + r)h(x). Hence, for t > r—a, the benefit 
payment rate is 


B; = f nt-x+ajsa() nx,t)dx=k f n(t—x-+a)salx)w(nt xr) h(a) de. (2.1.9) 


(As a matter of fact, the integration runs only from r to a +t as the integrand equals zero 
for x > a+t.) Itis worthwhile to emphasize that B; is the rate of current pension payment; 
that is, at time t. 

Now, we turn to the APV of future benefit payments. Consider a retiree age x. Similarly 
to (2.1.5), for a unit initial pension rate, the APV of the future (remaining) pension annuity 
is 

a” = i e A(x +y) (y)dy = i e Yh(x —r+y)5(y) dy, (2.1.10) 


where as above, 6 is a risk free rate. As has been already noticed, since x > r, the survival 
function s,(y) is based only on mortality. 
Thus, the APV of the future pension payments is 


f n(t—x+a)sa(x)-kw(r,t—x+r) a” dx= ef n(t—x+a)sa(x)w(r,t—x+r) a” dx. 
i ú (2.1.11) 
The actuarial notation for this quantity is (rA),. 


EXAMPLE 1. (The “exponential-uniform” case.) Let us again consider the exponential 
case setting h(x) = e*"—"), n(t) = noe®™, and w(x,t) = w(x)e™. However, for sa(x) we 
will accept a more realistic assumption than that of the exponential case, that is, where the 
mortality rate is constant. In Section 7.1.6, we saw that for elderly people the distribution of 
the remaining life time was close to a uniform distribution. So, we assume that the survival 
function for the remaining life time of a retiree, i.e., s,(x), corresponds to the uniform 
distribution on an interval [0,d]. Say, if r = 66, as an approximation, we can set d = 30 
certainly realizing that, as a matter of fact, people may live longer than 96 years. Thus for 


x >r, we will use the representation 


sq(x) = sa(r) (-) (2.1.12) 


where s,(r) is a fixed quantity: the probability that a member chosen at random will attain 
the retirement age r. Then, assuming t > (r—a) +d, proceeding from (2.1.9), and making 
the variable change y = x — r, we have 


r+d — 
B, = ef noe? =+ 5, (r) (1 =e 7 =) w(r)e" =+) ear) dx (2.1.13) 


d 
= knosa (r)w(r) f gare) (1 — z) ede% dy 
0 d 


d 
= knosq(r)w(r)e(@*" gka (1 — >) eT OTA) dy (2.1.14) 
0 


= (knosa (r)w(r) ep, ) en (2.1.15) 


2. Pension Funding 545 


where bı = b(d,@,T,@) is the value of the integral in (2.1.14). Note that bı does not 
depend on time ¢ or the retirement age r. Thus, in the case under consideration, the benefit 
rate is growing exponentially, which might be expected. In Exercise 15, we discuss other 
aspects of (2.1.15). 


Now, let us turn to the APV. We know that for a uniform r.v. X, the conditional distri- 
bution of X given X > x is uniform. Hence, the survival function sy(y) corresponds to the 
distribution uniform on [0,r +d — x]. So, by virtue of (2.1.10), for x € [r,r +d], we have 


ah) = Hae ay alytx—r) (ea 
a [ ee oa ae dy 
d—(x—r) 
— eu-r) f e- (8-a)y (1 = 5) dy = by (x—1), 


where for s < d 


d-s 
ba(s) = ba(sid,8,0) = | e-Bay (1-2) n 
0 d-s 


For s = d, we may set b2(d) = lim,_,q_ b2 (s) = 0. 

When substituting all of the above representations into (2.1.11), it is convenient to keep 
in mind that formulas (2.1.9) and (2.1.11) are similar. Using the variable change y = x —r 
in the second step, we have 


r+d = 
(rA); = kf noe®0 +9) s (r) (1 Oox - ") w(r)ett47) 680) by (x — r) (2.1.16) 


d y 
= knosa(r)w(r) f ee) (1 = z) el) eV b (y) dy 


0 
d 
= knosq(r)w(r)e(@" ee (1 — >) eTA h, (y)dy (2.1.17) 
0 
= (knosa(r)w(r) eps) eTO, (2.1.18) 


where b3 = b3(d,@,T, 0) is the value of the integral in (2.1.17). 
So, the APV is also growing exponentially at the same rate as in (2.1.15). We discuss the 
last formula in more detail in Exercise 15. 


2.2 More on cost methods 


In a sense, the models of this section are simpler than the previous because the fund we 
consider does not accept new participants after it has been established at an initial time. On 
the other hand, we explore new details. Time below is discrete, and all plans are DB plans. 

The models below are similar to some models in [4]; the reader may find more details 
there, though our exposition is somewhat different. 


2.2.1 The unit-credit method 


For an integer time moment t denote the set of all active (non-retired) participants by ,; 
the age of the j-th participant at time t by x; (the second index f is suppressed); and the 


546 11. PENSIONS PLANS 


same for all participants retirement age by r . Suppose the fund was established at an initial 
time, the age of participant j at this time was a;, and there were no enrollments after that. 
So, for active (non-retired) participants aj < xj < r. 

For simplicity, we assume all x’s, a’s, and r to be integers. To make notations less 
cumbersome (especially when it does not cause misunderstanding), we will omit the index 
jin xj, aj, and other individual characteristics. 

For a participant, denote by B(x) a projected pension rate that the participant has already 
earned by age x. (As a matter of fact, B(x) = Bj(x;); we suppress the index in notation.) 
The function B(x) is non-decreasing, and it is natural to set B(a) =0. We call B(x) an 
accrued benefit rate at age x. So, B(r) is the maximal benefit rate for which the participant 
can hope. The pension annuity is assumed to be level at the rate B(r). 

Thus, at age x < r, the APV of the participant’s accrued benefit is B(x)-v"* - „—xPx' G, 
where as usual v is a discount, ,p, is the probability that the participant will be an active 
member s years later, and ä is the APV of the future pension annuity at a unit rate from the 
standpoint of the retirement time. 

For example, ä may be ät? the APV of the discrete time annuity paid monthly with a 
unit annual rate. 

For simplicity, we call ¿px a survival function and assume it to be the same for all partic- 
ipants. 

Thus, the APV of the total future pension benefits from the standpoint of time t is 


L BWT rx Py (2.2.1) 
jeA, 


For the fund to be viable (on the average), this quantity should equal the APV of the obli- 
gations of the fund; so we call the last expression an accrued liability and denote it by 
(AL). 

The accrued liability changes in time for two reasons: the participants are getting older, 
and the structure of the active group may change. As was told, we assume that there are 
no new entrants. However, we should take into account that (a) some contracts may be 
terminated due to death or withdrawal, (b) once participants reach the retirement age, they 
cease to be active members. 

Regarding a particular participant, how much should be added to the fund to take into 
account these changes? 

If participant j at age x; is becoming one year older, then the accrued benefit rate will 
grow by AB; = AB;(x;) = B(x; +1) — B(x;). However, this change will be needed only 
if the participant survives the retirement age, which will happen with probability ,x,px,. 
Also, the payments will start when the participant attains age r. So, on the average, the 
amount that should be added to the fund is AB jv’ “/ r—x; Px; 4. We call this (net!) character- 
istic a normal cost of the plan at time t and denote it by (NC),;. So, 


(NC); = AB jv" * r—x;Px;4- (2.2.2) 
The total normal costs for the whole fund at time ft are 


(NC), = } (NC). 
icA, 


2. Pension Funding 547 


p> Let us consider it in more detail, working with r.v.’s rather than with their expected 
values. Suppose that we are at time t; so the set 4, is fixed. Denote by J = T, the set of 
participants whose contracts will be terminated in the period [t,t +1], and by R, = R, the 
set of all participants who will retire at time t+ 1. 

Let the r.v. Ijs = 1 if the j-th contract will be terminated during the period mentioned, 
and Jj, =0, otherwise. Then the set T = {j:Jj;=1, jE Ar} and R ={j:xj=r—-1, ljp= 
0, j € A+}. Clearly, 

Ay = Ani +T +R, 


where Ap = {j:xj <r—1,/; =0,j € A}. It is worthwhile to emphasize that from the 
standpoint of time ż, the sets 4,,;, Z,, and R, are random. 

Let Y; be the present value of the future pension payments for member j from the stand- 
point of the retirement time; so E{Y;} = d. Let Ti = | if participant j (of age x;) is a 
member of the fund at the retirement age r, and Tht = 0, otherwise. 

Denote by Y, the (random) present value of the fund’s liability at time t. (So, E{Y,} = 
(AL);.) We have 


Y, = Ł Bj(xj)v* is Y; = Ł B;(xj + 1)“ R Y; _ Xe ABV * Tis Yj. (2.2.3) 
ic A, jc A, icA, 


Denote the second sum in the far r.-h.s. of (2.2.3) by Gis. Furthermore, 


E Bis +E +. (2.2.4) 
icA, jeAn: jeT, jR, 


The first sum above equals 


v È Biati OPT). (2.2.5) 
5A 


Let us observe that if j € A;+1 (i.e, the participant is active at time t + 1), then Tis = Titis 
and the sum in (2.2.5) equals the r.v. Y,+1. Then, the whole expression is equal to v Y;+1. 
For j € T, the indicator Tis = 0; so the second sum in (2.2.4) vanishes. 
The third sum in (2.2.4) equalsv )" Bj(xj+ 1)v'-C7+ D7, 41 Y;. The last sum—denote 
icR, 
it by Gj2—is the (random) present value from the standpoint of time t+ 1 of the total 
pension benefits to be paid for the participants who will retire at time t + 1. 
Collecting everything, we write 


Y; = vY + VG = Gi š (2.2.6) 
Let i be the risk-free interest. Then v = 1/(1 +i), and we will rewrite (2.2.6) as 
Tars (1 +i) (V+ Ga] — Gn. (2.2.7) 


This is a balance equation. The accrued liability Y, may be viewed as the capital of the 
fund at time ¢ needed for the fund to be viable. By time ¢ + 1, this capital will have grown 
to (1+i)Y;, and the amount G; will be withdrawn for the new retirees to purchase the 


548 11. PENSIONS PLANS 


desired annuities. From (2.2.7), it follows that for the fund to be still viable at time t + 1, 
the amount G;; should be added at the beginning of the year t. 

Certainly, rigorously speaking, all of this is impossible because from the standpoint of 
time f, the quantities G;; and G;2 are random, and we do not know how much should be 
deposited. If we restrict ourselves to expected values, we will come to net characteristics. In 
a more sophisticated model, we would add to the expected values security loadings which 
depend on the lowest acceptable probability of viability. 

The reader will readily double check that the expected value E{G,,} is eqaul to the 
normal cost (NC); = Xa, (NC):j, where the normal cost (NC);j; is given in (2.2.2). < 

Now, let us consider the relative growth of the normal cost for one participant during a 
year; that is, 

(NC) (141); — (NC)tj 
(NC), j 
If the age of a participant at time t is x, then the age at time t+ 1 is x+ 1. To make the result 
more explicit, instead of the last expression, let us consider the infinitesimal characteristic 


1 d 
(NO), de NO) In view of (2.2.2), 
tJ 
14 yo), = PNO _ dnABj() | din spy, dinv’™* 
(NC);j dx Se age ee 


(The index j in x; is suppressed.) 


r 
As usual, set v = e~® and r—xPx = exp icf usjas}. Then 
x 


d\n(NC);j =., d\n AB ;(x) 
dx E dx 
This is important: in the framework of the method under discussion, the normal costs are 
growing; moreover, they are growing faster than the accrued benefits. On the other hand, 
the latter is strongly connected with salary growth; so the contributions needed are growing 
faster than the salary. 


+u(x) +8. (2.2.8) 


In Section 2.2.2, we fix this problem to some extent by considering a model with level 
costs, but for now we will stick to the current actuarial method and explore the expected 
value of all accumulated normal costs prior to time t. 

More specifically, we will show that as may be expected, 


In a balanced fund, the APV of accumulated prior normal costs equals the APV 
of accrued liability. 


To this end, we recall the definition of an accumulated value from Section 10.1.3; more 
precisely, the expected accumulated value of a temporary annuity, given that the annuitant 
survives the term of the annuity. The corresponding presentation obtained in Section 10.1.3 
was 


io a (2.2.9) 
V ` nPx 


2. Pension Funding 549 


The term G,.mj above concerns a level temporal annuity. To compute the APV of the ac- 
cumulated prior individual normal costs (which are not level) for a participant at time t, 
we may use the same formula replacing the numerator by the APV of the annuity with 
payments equal to normal costs. Namely, for a particular participant, omitting the index 
j, we replace äx:m in (2.2.9) by Yo vě kPa (NC) (x), where (NC) ,) is the normal cost for 
the participant under consideration at time k < t, and a is the age at enrollment. For the 
moment, let us denote the survival function by s(x). Proceeding from (2.2.2), we have 


t-1 
È  epa(Bla+k+1)- Baath Ww , (ase Pared 
k=0 


oe spy a8) sC) 

= LW +k+1)—B(a+k)) GO EN 

= ay) F (Bla +k-+1) —Bla +h) = avet Blas), 
sla) £ sla) 


(We used “telescoping” and the fact that B(a) = 0. Note also that B(a +t) = B(x).) 

Thus, for the normal costs accumulated prior to time f, we use (2.2.9) replacing the 
numerator by the above value obtained, and the denominator by v*~“ x—apa. Then, the APV 
in question is 


VS ga sla) T s(x) s(a 


which is consistent with (2.2.1). 


2.2.2 The entry-age-normal method 


In the model of this subsection, we assume the contributions to be level in time. So, the 
notion of contribution is not derived from original premises of the model but is defined at 
the very beginning. Let c; be the (level) one-period contribution of participant j. We again 
suppress j in the notation when considering a particular participant. 

For each participant, we require the APV of the total payments (from the standpoint of 
the initial time) to be equal to the APV of future benefits. That is, we require 


B(r) SF „—apaä = Ca ea (2.2.10) 


where di,.-=q is the APV of a unit-rate temporal annuity starting from age a and lasting no 
more than r — a years, and ä is the same as in Section 2.2.1. 

The quantity c above is a normal cost, and relation (2.2.10) is a definition of normal cost 
in the framework of the actuarial cost method under discussion. 

We saw in Section 2.2.1 that under the unit-cost method, at any time ft, the accrued 
liability equals the APV of prior costs. Now, we choose this property as a definition of 
accrued liability. That is, in accordance of general formula (2.2.9), the accrued liability at 
time ¢ of a participant age x is 


(AL), = CSa: xal =C aaa (2.2.1 1) 


vi -a Da 


550 11. PENSIONS PLANS 


To clarify this definition, we show that 


In a balanced fund, at any time, the accrued liability is equal to the APV of 
future benefits minus the APV of future costs. 


In other words, from the standpoint of any time, the APV of the prior costs plus the APV 
of future costs equal the APV of the future benefits. 

To prove this, let us recollect that for x < r, we have dig.;=q = Gg:;=q— V* “ x—aPaäx: 7z) 
Substituting c from (2.2.10) only in the first term below, we write 


(AL); ET Gaya E Äa:rzal 7: ve x—aPady:7=3| 


VI wy ig Da V4 yg Da VI yg Da 
yä y N pee x 
= B(r) v r—aPad gra V x—aPaQy: rx =B py vs a 
= 5 ` xa c aay = (r)v r—xPxd — Cay: 7=x] 
Gara v *x—a Pa v ‘x—a Pa 


which is exactly what we were trying to prove. 

The above calculation concerned a single participant. Clearly, the same properties are 
true for total characteristics of the plan: for the total accrued liability we should add up the 
accrued liabilities for all participants, and the same is true for the total normal costs. 


3 EXERCISES 


Section 1 


1. (a) A pension plan for a company’s employees provides a (future) annual income at re- 
tirement equal to a fraction of the final salary times the number of years in service. 
Suppose the future salary is projected to grow linearly, i.e., as wo(1 + tz), where wo is 
an initial salary, T is a linear relative rate, and z is the number of in-service years. Con- 
sidering the pension as a function of the time to retirement, with what type of functions 
are we dealing? 


(b) Consider two new employees of ages 25 and 30, with respective salaries $50,000 and 
$60,000. The salary growth rates mentioned above for these participants are different: 
0.02 and 0.0125, respectively. The employees are friends and have decided to retire 
simultaneously. (i) At what time should they retire for their pensions to be the same? 
(ii) If the employees instead decided to retire at the same age, at what age should they 
retire to have the same pension? 


2. The current salaries of two employees of age 40 and 45 are $40,000 and $60,000, respec- 
tively. Their salaries are growing exponentially with respective rates 4% and 3%. The pro- 
jected pension is a fraction k of the final salary, and retirement is possible after age 60. 


(a) If the employees retire at the same age, what should this age be for their pensions to be 
the same? 


(b) If the employees retire at the same time, when should it happen for their pensions to 
be the same? If the answer is non-realistic, change somehow the employees’s ages to 
make the answer realistic. 


10. 


3. Exercises 551 


. Suppose that (1.1.4)-(1.1.5) are true, and the function sy is linear. 


(a) If o = 2, what is sọ? If Sa Æ 2, what is the structure (or type) of the function aa 


(b) Let w) (40,5) = 1.1w (40). Find s,. Explain why it suffices to have a solution up to 
a constant multiplier. 


. In the case of Example 1.1.1-2, find the probability that the participant will retire. Write a 


general formula for this probability. 


. (a) In the framework of Section 1.1.1, find a formula for the APV of future pension benefits 


for a plan for which all participants must retire at an age r. (b) What should we change in 
Example 1.1.1-2 for this example to satisfy the above condition? (c) Find the particular value 
of the APV in this case. 


. Inthe situation of Section 1.1.1, among two participants having identical salaries at each age, 


the first has a smaller death and withdrawal hazard rates for all ages. Do you expect the APV 
of pension benefits for the first participant to be larger? If not, which additional condition 
should be imposed for this to be true? 


. In the situation of Example 1.1.1-2, consider the case of constant hazard rates and the pres- 


ence of a beneficiary (see remarks in Section 1.1.1). More specifically, assume that the 


decrement rate fig (y) = u for y < r— x and fi (y) = 0 for y > r — x; the retirement hazard 


rate (including the force of mortality) us? (y) =0 for y < r—x and us? (y) =u fory >r—x. 
When computing the APV of the pension annuity from the standpoint of the retirement time, 


we proceed from a constant mortality rate u2. 


(a) How will (1.1.3) look in this case? 


(b) For the data in Example 1.1.1-2 with u; = 0.1 and u2 = 0.05, find the particular value 
of the APV. 


. (a) Find the limit of the r.-h.s. of (1.1.13) as y— œ. Comment on the result from an economic 


point of view. (b) For a finite y, how should (1.1.13) be changed if instead of the initial salary 
wọ, we are given a salary w at age x. (c) Let y= 1. What will change in (1.1.1) if we replace 
the final salary with the average salary in the sense of (1.1.13). 


. How would formula (1.2.3) change in the case p = B? Is the r.-h.s. of (1.2.3) still positive 


if p < B? Explain that the restriction p > B is reasonable from an economic point of view. 
(Hint: For p < P, the investment growth is smaller than the salary growth.) 


In the situation of Section 1.2, consider the case where the future retiree does not need more 
than an amount W for comfortable (in her/his opinion) living. More specifically, assume that 
the contribution rate is equal to a constant c < 1 up to the moment when the income remaining 
after the contribution in the retirement account reaches the level w. After that the participant 
contributes the entire surplus into the retirement account. Assume also w(t) = woe! and 
p(t) =p. Write the integral in (1.2.1) in this case. When will it differ from (1.2.3)? You may 
also provide particular calculations of the integral which are simple but a bit cumbersome. 


(a) Is the expression in (1.2.4) increasing in d? Justify the answer mathematically and give 
a common-sense explanation. 


(b) In Example 1.2.2-1, provide formulas and calculations for d = 1. Did you expect a 
larger or smaller pension rate in comparison with d < 1 ? 


552 


12. 


13. 
14. 


15. 


16. 


17. 


18. 


19. 


20. 


11. PENSIONS PLANS 


Section 2 


Lett > r—a. (a) Find the payroll function W, for a stationary population and a flat wage, 
and express your answer in terms of the person-years-characteristics T, from Section 7.1.5.1. 
(b) Find W, in the exponential case w(x,t) = w(x)e™, w(x) = woe®@-, n = noe™, sa(x) = 
et-a), 


What is the r.-h.s. (2.1.7)—we denoted it by @”)—equal to if a = 8? 


This exercise concerns the model of Section 2.1. (a) Will the increase of the retirement age 
cause a decrease of the normal costs? (Advice: Analyze factors reflecting the influence of the 
retirement age in (2.1.8).) 


(b) Certainly, a growth of the enrollment intensity œ must lead to a growth of normal costs. 
Verify that this is reflected in (2.1.8). 


(a) Analyze the dependence of the benefit rate in (2.1.15) on retirement age and enrollment 
intensity. (Advice: Regarding the latter parameter, it is more convenient to look at the original 
representation (2.1.13).) 


(b) Analyze the dependence of the APV in (2.1.18) on retirement age and enrollment inten- 
sity. (Advice: Regarding the latter parameter, it is more convenient to look at the original 
representation (2.1.16).) 


Rework Example 2.1.3-1 for the simple case when the survival function sa(x) = exp{—y(x — 
a)},t=0, and h(x) = 1. 


Not providing complete calculations, say how the formulas and examples of Section 2.1 will 
change if the initial pension rate equals a fraction k of the salary at the moment of retirement 
times the number of years in service. 


(a) Suppose that in (2.2.8), the projected pension rate B(x) is a fraction k of the current 
salary w(x), and w(x) is growing exponentially (in discrete time); more precisely, w(x +1) = 
w(x)(1 +). Provide the expression (2.2.8) in this case. 


(b) Do the same for the continuous-time approximation w(x) = w(a)e? 0-9, 


In the framework of Section 2.2.1, prove that if AB is constant, then (NC)(-41); = (1 + 
i)(NC)1j;/Px;, where i is a one-period effective interest. Find the relation between the total 
normal costs (NC);41 and (NC);, if we only include participants who will not retire in the 
next year. 


In the framework of Section 2.2.2, find the one-period contribution c and accrued liability for 
a participant of age x < r at time t if the pension rate equals a fraction k of the salary at the 
retirement time—denote this salary by w(r), multiplied by the number of years in the plan. 
The mortality force is a constant u; the risk-free rate is 6, instead of ä we use @,, and all other 
APV’s of annuities are replaced by their continuous-time counterparts. Find also AL for the 
case x = r and comment on the result. 


Chapter 12 


Risk Exchange: Reinsurance and 
Coinsurance 


The process of redistribution of risk, starting with purchasing insurances by individual 
clients, continues at the next level: insurance companies redistribute the risk they incurred 
between themselves. Such a risk redistribution may be even more flexible than that at the 
first level: the companies may share individual risks in different ways or redistribute total 
accumulated risk. A common practice is to protect the portfolio against excessive claims 
although there are many other forms of reinsurance. 

A company which reinsures a part of its risk plays the role of a cedent, while the company 
which assumes this part is a reinsurer. Section 1 concerns optimal forms of reinsurance and 
the amount to be retained from the standpoint of the cedent. 

In Section 2, we consider the negotiation process between two companies sharing a risk. 
In this case, each company is a cedent and reinsurer simultaneously, and the result of the 
negotiation is based on principles equally acceptable to both companies. 

At least theoretically, these principles are not necessarily connected with payments for 
reinsurance. The negotiation may be direct, and the companies may agree with a certain 
form of risk distribution without paying each other for reinsurance. However, when many 
companies are simultaneously involved in reinsurance, market price mechanisms are the 
most, if not the only, realistic mechanisms of exchanging risk. 

Actual reinsurance practice contains various combinations of forms of reinsurance: mu- 
tual agreements on direct reinsurance, trading risks, some financial products as special 
options, futures, or bonds. (See, e.g., [28], [37, Section 8.7], and references therein.) How- 
ever, in any case, this is a market where commodities to be exchanged are risks. We touch 
on this question in Section 3. 

The exposition below may be called fragmentary. Our goal is not to build a comprehen- 
sive theory but rather to make the reader acquainted with some basic notions and to give 
some examples. 


1 REINSURANCE FROM THE STANDPOINT 
OF A CEDENT 


1.1 Some optimization considerations 


We identify below risks and the r.v.’s of future payments and sometimes even use the 
term “risk” when talking about r.v.’s. 


553 


554 12. RISK EXCHANGE: REINSURANCE AND COINSURANCE 


1.1.1 Expected utility maximization 


Let X be the r.v. of the future payment of a company regarding either a particular policy or 
a risk portfolio, and c be the premium corresponding to risk X. The reinsurance procedure 
is specified by a retention function R(x) in the following way. The company retains an 
amount R(X) and purchases a reinsurance coverage for the risk Xjeins = X — R(X). The 
function R(x) specifies a particular type of reinsurance. 

Assume the following. 


e The expected value 
E{Xteins } = E{X — R(X)} =À, (1.1.1) 


where A is a fixed quantity. For example, the reinsurer agrees to reinsure risks only 
with a given mean value A. 


e The company pays the reinsurance premium 
Creins = (1 on Oreins) E {Xreins } = (1 + Oreins JÀ, 
where Oreins is a fixed reinsurance loading coefficient. 


e The optimal reinsurance corresponds to the maximization of the expected utility of 
the company’s surplus (or wealth) for a given utility function u(x) and an initial 
surplus w. 


Then the expected utility of the company’s future surplus is 
E{u(w+c — Creins — R(X) )}. (1.1.2) 
Comparing (1.1.2) with the quantity 
E{u(w—G—§&+r(&))} (1.1.3) 


which we considered in Section 1.5, we see that, although these quantities represent dif- 
ferent economic situations, they are mathematically identical. It suffices to establish the 
correspondence € = X, G = Creins — C, and 


r(x) =x—R(x). (1.1.4) 


Moreover, the restriction (1.1.1) coincides with the restriction (1.5.1.1). 
We proved in Section 1.5 that if u(-) is an increasing and concave function, then the 
maximum of (1.1.3) is attained at the function 


0 ifx<d, 
ra)=nl)= 40g ifx>d (EES) 


where d is specified by the condition (1.5.1.1) [or (1.1.1), which is the same]. Combining 


(1.1.4) and (1.1.5), we see that the maximum of (1.1.2) is attained at the function 


x ifx<d, 
RO) = Rats) = {4 fxd. (1.1.6) 


1. Reinsurance from the Standpoint of a Cedent 555 


Thus, the retained risk R(X) is the result of truncation of the original risk at level d. This 
type of reinsurance is called excess-of-loss reinsurance if it concerns each contract sep- 
arately, and stop-loss reinsurance when it is applied to the whole risk portfolio. In both 
cases, the company fixes a certain level of claims and reinsures the amount of the claim 
which exceeds this level. 

As in Section 1.5, it is worth noting that rule (1.1.6) is the same for all concave util- 
ity functions. So, to specify a particular reinsurance policy, we do not have to know the 
utility function of the company; actually it would be rather naive to think that such a func- 
tion exists. We just assume that the preferences of the company are close to those based 
on expected utility maximization for some utility function, perhaps different in different 
situations. 


EXAMPLE 1. Consider a homogeneous portfolio of n independent risks. The payment 
for each risk has an exponential distribution which we assume, without loss of generality, 
to be standard. The insurance security loading is 5%, the reinsurance loading is 10.5%. 

The company decides to spend 20% of the premium on stop-loss reinsurance. This 
implies that, if X is the total risk (payment), and Xeins is the risk to be reinsured, then 
(1 + reins )E {Xteins } = 0.2(1 + O)E{X}, where 6 = 0.05 and ®reins = 0.105 are the cor- 
responding loading coefficients. Thus, (1 + 0.105)E{Xreins } = 0.2(1 +0.05)E{X}, from 
which it follows that 

E{Xteins } = kKE{X}, where k = 0.19. (1.1.7) 


In accordance with what we have proved above, the optimal risk to be reinsured is Xreins = 
X — R(X). 

To specify the level d, it is more convenient not to use (1.5.1.4) but rather write the 
expectation of R4 directly as follows: 


d d 
E{Ra(X)} = f xdF (x) +d: P(X >d) = | xdF(x)+d(1—F(d)), (1.18) 


where F (x) is the d.f. of X. 

In our case, X = X1 +...+X,, where X; is the payment corresponding to the ith risk and 
having the standard exponential distribution. Hence, F is the I -distribution with parameters 
(1,n), and 


1 a 1 S 
E{R4(X)} = ah xx eae ed: rau ee dk 
<a T(n+1) 1 g (n+1)-1,—x 1 j n—l „—x o 
= Tha) Ta, x e et, xe “dx= 
=n (d;n+1)+d(1—T(d;n)), (1.1.9) 


where 


1 t 
T t; = n—1 xq 
(t;n) ad? e “dx 


the T-d.f. with parameters (1,7). If n is an integer, r'(t;n) may be written explicitly; see 
(4.2.1.3) where the right member is the T-d.f. with parameters (à, n). 


556 12. RISK EXCHANGE: REINSURANCE AND COINSURANCE 


Since X;eins = X — Ra(X), from (1.1.7) it follows that E{R4(X)} = (1 -A)E{X} = (1 — 
k)n = 0.81n. Combining it with (1.1.9), we get an equation for d: 


nl (d;n+1)+d(1—T(d;n)) =0.81n. (1.1.10) 


With use of good software, it is not difficult to estimate the solution d = d (n). For example, 
d(10) ~ 8.789, which the reader may verify on her/his own. (For instance, in Excel, r (t;n) 
is in the list of functions; in Maple, the function GAMMA(n,t) =T(n)[1 —T(t;n)].) 

The expected profit after reinsurance is 


0.8(1 +0)E{X} —E{R,(X)} = 0.8(1 + 6)E{X} — (1 — K)E{X} 
~ 0.8(1 +0.05)E{X} —0.81E{X} =0.03E{X}. 


So, after the reinsurance operation, the average return (profit) is reduced from 5% (since 
8 = 0.05) to 3%. This is the payment for stabilization. 

It is interesting to compare it with the result for the excess-of-loss insurance where we 
truncate the payment r.v.’s for each risk separately. In this case, we should consider the 
equation (1.1.10) for n = 1, which corresponds to the standard exponential distribution. 
It is easy to verify (using just a calculator) that the solution is dı ~ 1.660. We see that 
10d, > d(10), which is to be expected. It can be shown that the same is true for all n. 

Note that in the excess-of-loss case the average return is, certainly, the same as in the 
stop-loss case: 


0.8(1-+0)E{X} — Y E{Ra, (X;)} = 0.8(1 +0)E{X} — ya -—K)E{X;} 
i=1 i=1 


I= 


= 0.8(1+ 0) E{X} — (1 -KE{X}, 


since X = X,+...+ X;,. So, the difference is in the level of stability. 
In Exercise 1, we compute the variance of the retained payment in both cases, determine 
which is larger, and interpret the answer. 


1.1.2 Variance as a measure of risk 


In this subsection we follow mainly the ideas from [11, Sec.5.1]; see also references 
therein. 

It is interesting that we will come to the same reinsurance strategy (1.1.6) if we minimize 
the variance of the retained risk, Var{R(X)}, given the mean value E{R(X)}. Note that 
under condition (1.1.1), E{R(X)} = m — À, where m = E{X}. 


Proposition 1 For any R(x) with O < R(x) < x, and 
E{R(X)} =B (1.1.11) 


for a fixed P, 
Var{ R(X} > Var{Ra(X)}, 


where Rq(x) is defined in (1.1.6), and d is determined by the condition E{Rq(X )} = B. 


1. Reinsurance from the Standpoint of a Cedent 557 


A way to find d is given in (1.1.8); see also Section 1.5. A proof of Proposition 1 is 
relegated to the end of this subsection. 


In the next setup, the role of variance differs from what we considered above, which will 
lead to a different optimal strategy. 

Assume that the cedent specifies somehow what the retained risk should be on the av- 
erage, and what variance of the retained risk is acceptable. More precisely, the cedent set 
E{R(X)} = B, Var{R(x)} = 02,, where B and o2, are fixed numbers chosen by the cedent. 
(The index “ret” is the abbreviation of “‘retained”’.) 

In turn, the reinsurer charges a reinsurance premium which is determined by the expected 
value of the risk reinsured and its variance. In other words, the reinsurance premium Cyeins 
is a function of E {X;eins } and Var{Xreins }. 

In this case, E{X;eins } = m — B and is also fixed. Consequently, in order to minimize the 
cost of reinsurance given B and ož, the cedent should minimize Var{X;eins}. 

Proposition 2 Under the conditions stated above, Var{Xyeins} attains its minimum at 
the function 
R(X) =rX, where r is some non-negative number. (1.1.12) 


The reinsurance of the type (1.1.12) is called proportional or quota share reinsurance. 
Since R(x) < x, the retention coefficient r < 1. 


Let us turn to proofs. 


Proof of Proposition 1. Because Var{R(X)} = E{R?(X)} — (E{R(X)})? and E{R(X)} 
is fixed, instead of minimizing Var{R(X)}, we may minimize E{R?(X)}. Now, 


E{R?(X)} = E{[R(X) —d+d]*} = Ef (R(X) — d? } + 2dE{R(X) — d} +a? 
= E{[R(X)—d]?}+2dE{R(X)}—d’. 


The last two terms are fixed, and d is specified by the condition E{Ry(X)} = B. Hence, it 
suffices to find the minimum of E {[R(X) — d]?}. We have 


E{{R(X) —d]*} = E{[R(X) —d]*; X <d}+E{[R(X)—d]*;X >d} (1.1.13) 
> E{[R(X)—d]?; X < d}. (1.1.14) 


(The symbol E{X; B} means integration over a set B; formally, E{X; B} = E{X | B}/P(B).) 
From (1.1.13) and (1.1.6) it follows that 


E{[R4(X) —d]°} = E4 [X —d]*; X < d} +0 = E{[X —d]*; X < d}. 
From (1.1.14) we get that for any R(X) such that 0 < R(X) < X, 
E{(R(X) — d}?} > E{[R(X) — d]?; X < d} > E{[X —d]?; X < d} = E{[Ra(X) - dP} 


Comparing the left and the rightmost terms, we see that Ra(X) minimizes E{[R(X) —d]?} 
over all R(X) under consideration. W 


558 12. RISK EXCHANGE: REINSURANCE AND COINSURANCE 
Proof of Proposition 2. We write 
Var{Xreins } = Var{X — R(X)} = Var{X } + Var{R(X)} 
—2,/Var{X} -Var{R(X)}Corr{X,R(X)} = 0} + 62, — 26x OrCorr{X ,R(X)}, 


where o} = Var{X }, and Corr{X,R(X)} is the correlation coefficient; see Section 0.2.4.3. 
As was noted in Section 0.2.4.3, the correlation attains its maximum (which is equal to one) 
when R(X) = rX for some r > 0. E 


1.2 Proportional reinsurance: Adding a new contract to an exist- 
ing portfolio 


1.2.1 The case of a fixed security loading coefficient 


Consider a portfolio whose risk is represented by a r.v. X with mean m and variance 0°. 
Denote by c the total premium of the portfolio. We set 


c=(1+0)m (1.2.1) 


and assume that the above security loading coefficient O is the same for all portfolios or 
separate risks under consideration. 

Suppose that the portfolio has already gone through all preliminary adjustments such as 
reinsurance, an arrangement of the reserve fund, etc., so the company dealing with this port- 
folio is satisfied with its level of security (or at least the company optimized the portfolio 
riskiness). 

As we have done repeatedly, let us write the probability that the payment will not exceed 
the total premium, which is 


P(c—X > 0) =P(X—m < Om) =P(x* < 07) =P(x"< P), 


where the normalized r.v. X* = (X —m)/o, and k = o/m, the coefficient of variation (see 
also Section 2.3.1.1). 

The loading coefficient 0 is fixed. If X were normal, the probability we consider would 
have been completely determined by k. Hence, in this case we could view k as a risk 
measure: a smaller k would indicate a greater probability above. 

If X is not normal but represents a large portfolio, we may assume X to be close to a 
normal r.v. by the CLT. Then k may be viewed as a characteristic of riskiness up to normal 
approximation. 

Note that if X* had had another standard distribution distinct from normal, we could rea- 
son similarly. However, such a generalization would be rather formal, so the main argument 
for choosing k as a risk measure is based on the assumption of approximate normality. 

Suppose now that a new risk with a random future payment Xo is added to the portfolio. 
Set mo = E{Xo}, 0% = Var{Xo}, and denote by co the premium corresponding to the new 
risk. As we agreed, in our scheme, co = (1 +6)mo, where 6 is the same as in (1.2.1). 

The company is going to retain the risk rX, and reinsure (1 — r)Xo, where r is a retention 
coefficient. The goal is to find a “reasonable” r < 1. If this reasonable r turns out to be less 
than one: r < 1, then the company should reinsure a part of the new risk. 


1. Reinsurance from the Standpoint of a Cedent 559 


ro 
FIGURE 1. 


Assume that X and Xo are independent, and suppose that, in order to reinsure (1 — r)Xo, 
the company should yield exactly the corresponding share of the premium, that is, (1 —r)co. 
This is not a mild assumption, but we adopt it here. 

In this case, the future payment for the new portfolio is the r.v. X, = X + rXo with the 
expected value m, = m + rmo and the variance 62 = 6? + r?0}. The premium for the new 
portfolio is c, = c + rco = (1+0)m+r(1+8)mo = (1 +0)m,. Hence, the expected profit 
after reinsurance is equal to c, — m, = Om, = O(m + rmo) and is increasing with r. 

Thus, regarding the mean profit, the company wants r to be as large as possible. The 
principle we accept consists in choosing the largest r for which the riskiness of the new 
portfolio is not larger than the riskiness of the original portfolio. The riskiness in our 
scheme is completely characterized by the new coefficient of variation 


1/02 +7705 
kya 2 = ia (1.2.2) 


Mr m+ rmo 


Thus, we choose the largest r for which k(r) < k. 
Elementary calculations show that 


00 
e a k =— k = co: 
k(0) , k(r) > ko F as r — 00; 


' o?mo i 
e k'(r) <0 for r< ro = ——. and k'(r) > 0 for r> ro; 
oom 
e the equation k(r) = k has a positive solution iff k < ko, and this solution is 


1 
= 2k?m. —____.. 1.2.3 
rı m mR- ( ) 


We consider three cases. 


CASE 1: k > ko; see Fig.1. We see that in this case, k(r) < k for all r. Consequently, the 
best r = 1, and reinsurance is not needed. 


560 12. RISK EXCHANGE: REINSURANCE AND COINSURANCE 


k(r) k(r) 


FIGURE 2. 


CASE 2: k < kọ, and rı < 1; see Fig.2a. In this case, we choose r = rı. We have 
k(r,) =k, and r; is the biggest r for which k(r) < k. So, the riskiness of the new portfolio 
is the same as before, but because rı > 0, after adding the corresponding part of the new 
risk, the average profit of the portfolio has increased. 


CASE 3: k < ko, and rı > 1; see Fig.2b. In this case, we take r = 1, that is, reinsurance 
is again not needed. 


EXAMPLE 1. Let the original portfolio consist of n i.i.d. risks X; with E{X;} =m, Var{X;} = 


55 2) ae fo 1 © 00 
Oo”. Then m = nm, o? =no’, k= — = == > Qas n — æ, Hence, k < ko = — for 
m „nm mo 


sufficiently large n. The retention coefficient 


1 267 1 26°mo 
=2k = nw. 1.2.4 
TES M nokk) m (o2/mo)—m0?/(mn) oni eet 


Thus, for large n, reinsurance is needed if 262m < om. 

Now let the new risk have the same parameters as the risks composing the original port- 
folio: mọ =m, Go = ©. Then, it is straightforward to calculate that r4 = n > 1, and no 
reinsurance is needed. 


EXAMPLE 2. Let the new risk Xo take on two values: vo (the sum insured) and O with 
probabilities q and 1 — q, respectively. The insurer wants to reinsure a part r of the payment 
vo. In this case, the retained risk corresponds to a r.v. taking values Vyetainea = rvo and 
0 with probabilities q and 1 — q, respectively. The part (1 — r)vo is reinsured. We have 
Mo = voq, Oo = voy/q(1 — q), the “reasonable” retention r = r1, and in view of (1.2.4), 


1 1 1 
rı =2k°m =2k°m =2k°m : 
PN mg — mg vlaka] voll — Pa] 
Then the retained possible payment 
2 1 2 
Vretained = 1V0 = 2k°m-———_—_,_ ~ 2km (1.2.5) 


1—(1+k?)q 


for small q. 


1. Reinsurance from the Standpoint of a Cedent 561 


This is an old celebrated formula known from the beginning of the previous century, if 
not from an earlier time. The quantity m is interpreted as the net premium for the original 
portfolio and is denoted sometimes by P. So, the formula becomes 


2 
Vretained = 2k“ P. 


Route 2 => page 563 


1.2.2 The case of the standard deviation premium principle 


In this section, we follow mainly [123]. Consider the same problem with the following 
two changes. First, for a risk X with mean m and variance 6”, we adopt, as the premium 
principle, the rule 

c=m+Ao, (1.2.6) 


where A plays a role of loading. Second, we do not assume A to be the same for all risks. 

The reader who has not skipped Section 1.4 will recognize the standard deviation pre- 
mium principle in (1.2.6). However, to understand the material below, it suffices to view 
(1.2.6) as the definition of the coefficient 


à = (c—m)/o. 
As above, let X be the risk of the original portfolio. Then 
P(c—X > 0)=P(X -m< 0) =P(X* <A). 


Following the same logic, we adopt À as a stability characteristic and A! as a risk measure. 
Consider a new risk Xo with mean mo and variance 05. Denote by co the premium coming 
from the new risk. In accordance with (1.2.6), set Ag = (co — mo) /Oo. Then 


co = mo + A000. 


We retain all assumptions on X and Xo and reinsurance cost, made in the previous section. 
Then for the risk X, = X +rXo, we have again m, = m + rmo, o? =0 + rog, and the 
premium 


Ao + roo 
PEA ELL 


Cr = c + rco = m + Ao +r(mo +000) = mr + AG + rào0o = m, s 
a 


r 


=m,+X(r)o,, 


where 
Zz AO + rAQ6o E AO + rÀo0O0 


Or 1/02 +7r205 


a(r) (1.2.7) 


562 12. RISK EXCHANGE: REINSURANCE AND COINSURANCE 


r(x) a N) 


(a) 


FIGURE 3. 


Let a = 60/6. To make (1.2.7) easier to explore, set x = ra. Then, dividing the numerator 
and the denominator in (1.2.7) by 6, we have 


Denote the function on the right by A* (x). Then 
Mr) = à` (ra). 
It is straightforward to verify that 


(0) = A*(0) =A, A*(x) — Ao as x + œ, and 
2hor 


M(x) =A for x =x “WR 


The larger A(r), the less risky the new portfolio. We are looking for the largest r < 1 for 
which A(r) > À. Let 
X1 20 AO 


a AA) 


rı = 


CASE 1: À < Ao; see Fig.3a. Then A(r) > A for all r. Consequently, the best r equals 1, 
and reinsurance is not needed. 


CASE 2: À > Ao; see Fig.3b. In this case, if r; < 1, we choose r = r1, and if rı > 1, we 
choose r= 1. 


EXAMPLE 1. Let the original portfolio consists of n i.i.d. risks X; with E{X;} = m, Var{X;} = 
©. We assume that for each separate risk, the premium ¢ = m+A6, where A is a load- 


ing for separate risks. Then, since m = nm, o? = no’, we can write that A = — = 
re ae ee 
SAND) E R 
S/n 
QoAG MT 2hoS 
k 0 _ “ho M eg e (1.2.8) 


MM) Co Kn- Koo 


1. Reinsurance from the Standpoint of a Cedent 563 


for large n. Thus, for large n, reinsurance is needed if 2A96 < Xoo. 

Now let the loading for the new risk be the same as for the separate risks in the original 
portfolio: Ao =}. Then reinsurance is reasonable if Oo > 20. In particular, if the standard 
deviation of the new risk equals the standard deviation of separate risks in the original 
portfolio, there is no need for reinsurance. 


1.3 Long-term insurance: Ruin probability as a criterion 


In the case of long-term insurance, security requirements should concern the behavior of 
the insurance process in the long run. For example, one can choose, as a characteristic of 
riskiness, the ruin probability for a large or infinite horizon. 

Certainly, whatever risk measure we choose, we do not have to proceed basing solely 
on this measure. Usually a particular strategy of managing a risk portfolio is a trade-off 
between security and future profit, for example, the expected profit. If the ruin probability 
were the only criterion the company follows, it would not take any risk at all. Then the ruin 
probability would be equal to zero, but the profit would be zero too. 

If we choose ruin probability as a risk measure, the goal of reinsurance is to reduce 
this probability to a certain chosen level. There are two difficulties, one essential and one 
technical, in this problem. 

First, the cost of reinsurance may be high. In this case, since the payment for reinsurance 
goes from the original premium, the part of the premium that remains may turn out to 
be low. Then the purchase of reinsurance may lead not to a smaller but to a higher ruin 
probability. We will see it in examples below. Hence, we should first figure out the limits 
of reasonable reinsurance. 

Secondly, the ruin probability depends on the interior characteristics of the insurance pro- 
cess and on the initial surplus. When buying a reinsurance, we change the former; however, 
we could instead of purchasing a reinsurance, increase the initial surplus. Consequently, 
the purchase of reinsurance is an additional compromise between the level of the surplus 
and the size of the risk retained. To make the problem simpler and solutions to be unified 
for all possible sizes of the initial surplus, we can proceed not from the ruin probability 
itself but from Lundberg’s upper bound e~™, where u stands for the initial surplus and y for 
the adjustment coefficient; see (6.2.1.4). Then we can choose, as a security characteristic, 
the adjustment coefficient y and work with it independently of u. 

We will restrict ourselves to two particular examples. 


1.3.1 An example with proportional reinsurance 


Consider the discrete time model of Section 6.2.2.2. Let X be a separate claim, m = 
E{X}, © =Var{X}, and c be the corresponding premium. For the adjustment coefficient 
y, we use the representation 


5 (1.3.1) 


which is precise if X is normal and, as was shown in Section 6.2.2.2, may serve as a good 
approximation in many other situations. 


e 


564 12. RISK EXCHANGE: REINSURANCE AND COINSURANCE 


Consider proportional reinsurance, denoting by r the retention coefficient. The retained 
risk amounts to the r.v. X, = rX. Note that, if X is normal, then X, is also normal, and we 
can apply (1.3.1) to it. 

Assume that the price for reinsurance is specified by a reinsurance loading O9 reins. Then 
given r, the company pays for reinsurance, per each claim X, the price (1+ Oreins E{X — 
rX} = (1+ Oreins)(1 — r)m. Consequently, if 6 is the security loading of the original insur- 
ance, the premium retained after reinsurance is 


Cr = (1+ 6)m— (1+ Oreins) (1 — r)m. (1.3.2) 
It is straightforward to verify that (1.3.2) may be written as 
Cr = [r+ Oreinsr — A], 


where A = ®reins — 9. 

It is natural to assume that A > 0. Otherwise, the insurer would yield the whole risk to a 
reinsurer, keeping the remaining premium as a pure profit. 

We have E{X,} = m, = rm, Var{X,} = ro, and by virtue of (1.3.1), the adjustment 
coefficient for the retained insurance is equal to 


2(cr—m,) — 2[8reinsr — A]m 
gy = Me) enr A (1.3.3) 


For r = 1 (no reinsurance), the adjustment coefficient 


20m 
y= "en 
Combining it with (1.3.3), we can write that 
Y =ye(r), (1.3.4) 
where the function 
(r) = 1 ; reins’ — A 
g (3) r2 i 


Clearly, g(1) = 1. The reader will readily verify that g(r) attains its unique maximum at 


the point 
2A  2(Oreins — 9) 


ro = ; (1.3.5) 
% Oreins Oreins 
and g(r) is decreasing at a point r iff r > ro. 
The relation rọ > 1 is equivalent to 
Greins > 20. (1.3.6) 


We interpret this instance as a high reinsurance cost. In this case, g(r) < 1 for all r < 1, 
and reinsurance does not lead to a higher stability. 

If ro < 1, taking an r between [ro, 1], we get a higher adjustment coefficient, which we 
interpret as the case of higher stability. The maximum is attained at r = ro, but this is also 
the case of the lowest (for r from [ro, 1]) expected income. 


EXAMPLE 1. Let 8 = 0.1. In view of (1.3.6), reinsurance makes sense only if O seins < 
0.2. Say, for Oreins = 0.15, we will have rp = Z, and one third of risk may be reinsured. The 
choice of a particular r between 5 and 1 should be determined by a compromise between 
stability requirements and the mean profit. 


1. Reinsurance from the Standpoint of a Cedent 565 


1.3.2 An example with excess-of-loss insurance 


Consider now the continuous time model of Section 6.2.2.4. In this case, the adjustment 
coefficient y is a solution to the equation 


Mx(z) =1+4+ cz, (1.3.7) 


where c = (1 +8)m is the premium per one claim, and X is the r.v. of one claim; see 
(6.2.2.22). For the rest, the notation is the same as in the previous section. 

Let X be uniformly distributed on [0,1]. This is, of course, a particular case, but the fact 
that we chose the unit interval does not restrict generality. 

Consider the instance of excess-of-loss insurance with a level d < 1, which means the 
following. 


e For each claim X the company covers the part amounting to the r.v. 


gafa ikea, 
te) dt ifX>d. 


e The company pays for such a reinsurance (1 + Oreins)E{X — Xa} per one claim. 

Since X is uniform on [0,1], the r.v. Xy assumes the value d with probability 1 — d. It 
is not difficult to verify that the conditional distribution of X4, given X < d, is uniform on 

[0,d]. Making use of it, one can calculate that 

1 » 
E{Xq} = ma =d— 7d ; (1.3.8) 
1 2 
E{X Xi} = 5(1-4}, 
1 

Var{X4} = 04 = pi 4 —3d). (1.3.9) 


Consequently, the premium per one claim after reinsurance is equal to 


ca = (1+ 0)E{X} — (14 Oreins JE {X — Xa} = s(t +0) —(1+4reins)(1 —d)7]. (1.3.10) 
The m.g.f. 
Mx, (z) = [eas te4(1—d)= - (e + (1—d)z| He (1.3.11) 


Given d, the adjustment coefficient Yq is a solution to the equation 
My, (z) =1+cgz. 


Substituting (1.3.10) and (1.3.11), we come to the equation 


e“[1+(1—d)J=1 bet sll + 8) — (1+ Oreins)(1 —d)? Jz”. (1.3.12) 


566 12. RISK EXCHANGE: REINSURANCE AND COINSURANCE 


The |.-h.s. of the equation is an exponential function multiplied by a linear function, and 
the r.-h.s. is a quadratic function. Certainly, this equation cannot be solved analytically, but 
solving it numerically for particular values of the parameters does not present any difficulty. 
A reasonable procedure would be as follows. 

For given values of O and Oreins, We Consider a sequence of d’s running from 1 to 0 in 
small steps. For each d, we find numerically a solution Yg to (1.3.12). We stop the procedure 
when the sequence Yq starts to decrease. 

We will not carry out this procedure here, but consider a simpler approximation based 
on (1.3.1). As we noted in Section 6.2.4, (1.3.1) may be not a bad approximation in the 
continuous time case also. [See in this section the comparison of the equations (6.2.4.1) 
and (6.2.4.2).] 

Thus, we use the approximation 


Ya © eid) = 24q(d), where (1.3.13) 
d 
B 1 Liye . a ee 
dd) = ag ag (G++) d-a): 


The last formula follows from (1.3.8), (1.3.9), and (1.3.10). 
The function q(d) is easy to calculate. The table below contains its values for @=0.1, 
Breins = 0.3. 


d 1/0.9 | 0.8 | 0.7 | 0.69 0.68 0.67 0.66 0.65 
100q(d) | 5 | 5.11 | 5.37 | 5.60 | 5.6125 | 5.6207 | 5.6247 | 5.6238 | 5.617 


We see that yz begins to decrease at d ~ 0.67. 
However, this is a rough approximation. As we saw in Section 6.2.4, the approximation 
(1.3.13) works well for small y’s. In our case, y has an order of 24q(d) + 24-0.056 = 1.344, 
which is not small. So, for a more precise solution we should appeal to (1.3.12). LI 


2 RISK EXCHANGE AND RECIPROCITY OF COMPANIES 
2.1 A general framework and some examples 


As was noted in the beginning of this chapter, when talking about risk exchange, we 
should distinguish two different approaches (which, however, may be combined). In the 
first approach, risk (or random income) is considered a commodity which may be traded. 
The exchange mechanism in this case is a decentralized market mechanism which appears 
to be efficient in the case of equilibrium. 

A classical example is a stock exchange market, but we can talk also about a reinsurance 
market where insurance companies exchange risks. We consider a corresponding model in 
Section 3. 


2. Risk Exchange and Reciprocity of Companies 567 


The second approach concerns the situation when companies enter into direct negotia- 
tions about possibilities of risk exchange. The process of such negotiations is connected 
with the concept of a non-zero-sum or cooperative game. 

A zero-sum game, say, in the situation of two players, is a game in which for any combi- 
nation of the players’ strategies, one player wins exactly the amount the player’s opponent 
loses. In a non-zero-sum game, a gain by one player does not necessarily correspond with 
a loss of the other. There are strategies under which both players may benefit in compari- 
son with what they had in their initial positions. The problem here consists in determining 
such strategies and choosing one of them. The corresponding theory is usually called the 
bargaining theory (see, e.g., [82], [97]). 

The negotiations between two or more companies about sharing the risks they have may 
be considered a cooperative or non-zero-sum game. The companies analyze possible out- 
comes and establish principles which would lead to a particular exchange of risk, reason- 
able (“fair”) from all companies’ point of view. 


EXAMPLE 1. We proceed from the scheme of Section 1.2.1. Two companies have 
initial portfolios whose risks amount to r.v.’s X1, X2 with means m; ,m and variances o7, 05, 
respectively. The companies decide to share a new risk Xo with a mean of mo and a variance 
of 64. The security loading is the same for all possible insurances. 

Let r be the share of the new risk which goes to the first company and, accordingly, (1— r) 
is the share of the second company. The companies negotiate on a reasonable choice of r. 

Given r, the new risks for the companies are X,; = X1 + rXo, X,2 = X2 + (1 — r)Xo. 

Suppose, as in Section 1.2.1, that the companies choose variation coefficient (1.2.2) as a 
risk measure. Then the companies proceed from the respective coefficients 


1/02 +r203 05 + (1—r)*0% 
A = ane (2.1.1) 


r 
my+rmo ” 2 m +(1—r)mo 


One of natural rules here could consist in the principle 
of equal riskiness for both companies, which leads to the 
equation 

ki (r) =k(1—r), (2.1.2) 


provided that a solution r € [0, 1] exists; see also Fig.4. 
Assume, for instance, that mı = m and are much larger 

than mo. Then the denominators in (2.1.1) are close to each 

other, and (2.1.2) may be approximated by the equation 


ot +r 207 (1-— r) o3. FIGURE 4. 
A simple algebra leads to the solution 
Deine aa 
(07 Op- 0 
pe EIS (2.1.3) 
205 


If the two original portfolios have the same parameters (that is, 6? = 05, in addition to 
the equality of the mean values), then 7 = 5 which is natural. The example is continued in 
Exercise 8. 


568 12. RISK EXCHANGE: REINSURANCE AND COINSURANCE 


Consider now a general scheme. To emphasize that it concerns not only insurance but a 
quite general situation, we will sometimes use the term ‘participant’ rather than ‘company’. 

The scheme involves n participants. The ith participant is characterized by a future ran- 
dom income Y;. 

When a participant is an insurance company, Y; is the future surplus of the company, and 
we set Y; = c; — X;, where X; is the future payment provided by the company, and c; is the 
premium collected, i= 1,..,n. In the general framework, the structure of Y; is arbitrary. 

Set Y = (Y1, ..., Yn). This is the vector of initial (random) incomes. Assume that the 
participants exchange somehow parts of their risks. In the case of insurance, it means that 
the companies exchange parts of their portfolios. 

Denote by Z; the income of the ith participant after exchange. The random vector (r.vec.) 
Z = (Z1, ...,Zn) specifies this exchange. We assume that not all types of exchange are 
allowed or possible, and the vector Z belongs to some set Z which represents all admissible 
types of exchange. For example, in many situations it is natural to assume that the total 
income does not change, so for all Z € Z we should have 


n 
BAY, (2.1.4) 
i=l i=l 


However, this condition may not exhaust all possible conditions for portfolios exchange. 


EXAMPLE 2. Consider two companies with original risks X,,X2 and premiums c1,c2, 
respectively. The companies are negotiating about mutual reinsurance of their risks. As- 
sume that the companies choose to restrict themselves to proportional reinsurance. Denote 
by r1,r2 the retention coefficients for the first and the second company, respectively. More 
precisely, the first company retains the share rı of its own risk, cedes the share (1 — r1 ) of its 
risk, and accepts the share (1 — r2) of the second company’s risk. The respective numbers 
for the second company are r2, (1 — r2), and (1 — r1). 

In this case, the reinsurance procedure is specified by a 
r2 pair r = (r1,r2) which is a point in the unit square R, = 
(1,1) {0<r1<1,0< m <1}; see Fig.5. 

Given r, the first company covers the claim rıX; and the 
claim (1 — r2)X2 ceded by the second company. So, the to- 
tal claim covered by the first company is Xyı = rıXı + (1 — 
r2)X2. For the second company the corresponding claim is 
Xr = (1 — r1)Xı +r2X2. 

Certainly, the companies should somehow redistribute the 
premiums they have. The ways to do that may be different. 
We wilt HEE tome of them below but for now, not specifying the rule of premium dis- 
tribution, we will just denote by c,;, the premium the ith company keeps after reinsurance. 
Certainly, 


ri 


Cri +Cr2 = C1 + C2. 


Then, the profit of the ith company after such an exchange is the r.v. 


Zri = Cri — Xri, i= 1,2, 


2. Risk Exchange and Reciprocity of Companies 569 


and the set Æ consists of random vectors Zy = (Zr1,Zr2) where r ER. 


Suppose now that each participant, when evaluating the quality of its position, proceeds 
from a function U(Z) assuming real values and perhaps different for different participants, 
defined on a set of r.v.’s Z. The participant views U(Z) as a characteristic of “quality” of 
the r.v. Z (from the participant’s point of view). For example, if a participant is an expected 
utility maximizer, then 

U(Z) =E{u(Z)}, (2.1.5) 


where u(x) is the utility function of the participant. Another example concerns the mean- 
variance criterion for which 


U(Z) =tE{Z} —Var{Z}, (2.1.6) 


and T is a tolerance-to-risk coefficient, a weight the participant assigns to expectation; see 
Section 1.1.2.5. Different participants may have different t’s. We remember that the crite- 
rion (2.1.6) is not applicable in the general case (see the section mentioned for detail) but, 
for example, when Z’s are normal, the criterion is meaningful. 

We call the value of U(Z) a quality index of Z. 


Denote by U;(Z) the quality-measure-function U of the ith participant. Then, before 
exchange, the quality index of the ith participant is equal to the number V;o = U;(Y;), and 
after exchange—to the number V; = U(Z;). Set y0) = (Vio; <- Vno), V = (Vi, <, Va). 

The vector V is specified by the choice of Z. If we denote by U(Z) the vector-function 
(Ui (Zi), --,Un(Zn); then 

V = U(Z). (2.1.7) 


Thus, each r.vec. Z from the set Z of all admissible ex- 
changes generates a point V = U(Z) = (U1 (Z1), ...,Un(Zn)) 
in R”; for n = 2, the point lies on a plane. 

Denote by V the set of all possible points V = U(Z) gen- 
erated by the map (2.1.7). The set V is the image of Z, and 
represents all possible positions the participants may attain. 
A typical picture is depicted in Fig.6. 

The point yo, representing the initial position of the par- 
ticipants, is called a status-quo point. Usually it is an interior 
point of the set V, as it is shown in Fig.6. FIGURE 6. 


EXAMPLE 3. This example is somewhat artificial but illustrative. Later, we will con- 
sider more realistic examples. Let n = 2, each participant is an expected utility maximizer 
with u(x) = \/x; the initial incomes Yj, Y> are positive r.v.’s such that E{u(Y;)} = E{VY;} 
is finite for i = 1,2. 

To make the example simpler, assume that the participants have a right to refuse a part 
of the income, say, choose Z; = 0, Z2 = 0. Certainly, it will not correspond to optimal 
behavior and eventually such solutions will be rejected, but it is convenient to include these 
possibilities into original consideration. Such an assumption means that, instead of the 
condition Z; + Z2 = Yı + Y2, we will consider the condition 


Zi+% < Y +r. (2.1.8) 


570 12. RISK EXCHANGE: REINSURANCE AND COINSURANCE 


Assume that the class Z consists of all positive r.vec.’s Z = (Z1, Z2) for which (2.1.8) is 
true. 


In this case, V is the quarter of a disk, more precisely, V 


P v2 is the set 
V? +V? <&@,0<Vı <a, 0<V<a, (2.1.9) 
V where a = E{ VY; +Y}; see Fig.7. 
a Indeed, let us set S = Yı + Y2, V; = E{VZ;}. By the 
FIGURE 7. Cauchy-Schwarz inequality (0.2.4.4), 


aeva- (A A E Eh 
Then, since Zi +Z. < Yı +V =S, 
ETORO OEO 
< 
T 


Cae [2) 5- 


To show that the boundary 


{V +V =g, 0<V <a, 0< V <a} (2.1.10) 


is attainable, set Z1 = kS, Z2 = (1 —k)S for a constant k € [0,1]. Then 
V? +V = (E {VZ} + (E {V2} = (e{ vist) + (e{ va HSS)" 


2 2 2 
SSO ea) es) a 
In Exercise 9a, we show that any point satisfying (2.1.9) corresponds to a solution 
(Z1,Z2). 


The status-quo point is V = (U; (Y1), U2(¥2)) = (E{ VN}, E{ V2}. It may be shown 
that under mild conditions, it is located inside V. 


Let us consider a particular example. Let Y,,Y2 be inde- 


V, 
0.975 £ pendent, and uniformly distributed on [0, 1]. Then E{,/Y;} = 
an f Jxdx = 4, and V® = (3,3). 
The r.v. S has a triangular distribution with the density 
V f(x) =1-|x—- 1| for x € [0,2]. (See Example 2.2.1.1; 
2/3 0.975 the density (2.2.1.6) in this example may be represented 


as we ae. Then a = E{\/S} = ed —|x—1])dx = 


FIGURE 8. i8/2— $ ~ 0.975. See Fig.8. 


2. Risk Exchange and Reciprocity of Companies 571 


EXAMPLE 4. In the situation of Example 2, V consists of all points (Vi (r), V2(r)), 
where V;(r) = U;(cri — Xri), and r € R. The status-quo point corresponds to r® = (1,1) 
(see also Fig.5), so V® = (U1 (Y1), U2(¥2)) = (U1 (c1 — X1), U2(c2 — X2)). We will consider 
this example later in greater detail. 


We continue to consider the general scheme. The participants enter into negotiations in 
order to agree on some rule of exchange Z appropriate for all participants. 


The participants would act rationally if they rule out any agreement such that there exists 
another agreement which will give higher quality indices simultaneously to all participants. 

Accordingly, we say that a point V from V is Pareto optimal if there is no point from V 
all of whose coordinates are not less than the corresponding coordinates of V, and at least 
one coordinate is strictly larger. We denote the set of all Pareto optimal points by Y, and 
the set of corresponding r.vec.’s Z by Z. 

In the general picture in Fig.6, V is the “North-East” boundary of V. In the particular 
Example 3, this is the boundary (2.1.10): the quarter of the circle in Fig.7. 


Certainly, we may restrict ourselves to points from V, and accordingly to exchanges 
(solutions) from Z. 


Now, if the status-quo point V® is an interior point of V, there are points V all of whose 
coordinates are strictly larger than the corresponding coordinates of V. This implies that 
all participants can simultaneously improve their quality indices in comparison with their 
initial positions. 

Denote the set of all such points by % (the area “to the North-East” of VO in Fig.9.), 
and the set of the corresponding r.vec.’s Z by Zp. Each point from Zo generates a point 
from W; in other words, 1% is the image of Zo. 


It is clear that all participants may agree simultaneously 
only on a point from %. On the other hand, it is reasonable 
to consider only points from V, so we should consider the VY 
intersection of these sets, that is, the set of all Pareto optimal 
points from /. 


Denote this set by Vo, and the set of corresponding r.vec.’s 0 


Z by Zo. In Fig.9, the set Vo is the part of the boundary to 
the “North-East” of V). 


v; 
FIGURE 9. 


We call points from Zo Pareto optimal solutions to the bargaining problem under con- 
sideration. 


EXAMPLE 5. Consider the numerical sub-example of Example 3. We saw already that 
in this case, Pareto optimal solutions are vectors Z = (Z1,Z2) = (kS, (1—k)S), for k € [0,1]. 
The Pareto optimal points V = (E{VkS}, E{,/(1 —k)S}) = E{VS}(Wk, (1 —k)) = 


a(WVk, \/(1—k)). On the other hand, for V to belong to Vo, both coordinates should 
be larger than 2/3. So, we should have 


572 12. RISK EXCHANGE: REINSURANCE AND COINSURANCE 


2 2 
avk > a aV1—k> z 


which results in 


$ <k<1 = 
92 7 7 9a? ` 
2/3 0.975 Since a = 1/2 = x = 0.975, we come to the solutions 
FIGURE 10. Z= (kS, (1—k)S) for 0.467 < k < 0.532 


up to the third digit. The set Vo in this case is shown in Fig. 10. 


Thus, for any particular problem, the task is to determine the set Zp, and then, proceed- 
ing from some additional principles, to choose one particular Pareto solution. 


These additional principles may be different and are strongly connected with the nature of 
the concrete problem under consideration. In the next sections, we consider some examples. 


Next, we discuss one general approach to determining the set Zp. We will do that 
at a heuristic level, skipping some formalities. The approach consists in maximizing the 
function 

G(Z) = a1U1(Z1) +... +anUn (Zn), (2.1.11) 


where aj;’s are positive constants. Consider the set of all Z’s for which G(Z) is equal to 
some constant c. The image of each point Z from this set is a point V = (V1, ..., Vp) for 
which 

aiVı +... + an Vn, =C. (2.1.12) 


Thus, the point V lies in the n-dimensional plane defined by (2.1.12). 

In the two-dimensional case, this is a line, as is shown in 
Fig.11. To make the constant c (and, hence, the value of 
G(Z)) as large as possible, we should “move this line to the 
North-East” up to the moment when the line becomes the 
supporting line for the set VY, that is, when the whole set V 
is below the line. It is possible if the set V is convex. In 
Fig.11, this is the line /;. =~ 

The intersection of this highest line with V will lie in V, 
the Pareto optimal boundary of V. So, for a fixed vector of 
coefficients a = (a1,a2), we will get a Pareto optimal point 


FIGURE 11. 


from V. If V is convex, considering all possible vectors a, we will come to all possible 
Pareto optimal points in V. The cases a; = 0 or a = 0 will correspond to the end-points 
of the Pareto optimal set. 

The n-dimensional case is similar. The only difference is that in this case we deal not 
with supporting lines but with supporting n-dimensional planes. 


EXAMPLE 6. Consider the scheme of Examples 2 and 4. All vectors Z in this case may 
be identified with the vector r = (r1,r2) of retention coefficients. For each r, we consider 
the point V(r) = (Vi (r),V2(r)) = (Vi (71,72), V2(r1,r2)), as it was defined in Example 4. 


2. Risk Exchange and Reciprocity of Companies 573 


ro ro 


FIGURE 12. 


Denote by R the set corresponding to the set Z. The set R is the set of all Pareto optimal 
solutions. Typically, this is a curve in Q, as it was shown in Fig.12. To find R, we should 
maximize the function 


QO(r1,r2) =a1Vi (r1, r2) +a2V2(r1,r2) 


for all non-negative a1,a2. 
Assume that V;(r1,r2),i = 1,2, are differentiable. Then we can make use of the gradients 
of the functions V; (r1,r2) and V2(r1,r2), that is, the vectors of partial derivatives 


Vi (r1,72) ment) 


or, : Ore 


OV2(r1,72) AV2(r1,72) 
or, : orn 


VV) = WV (r1,r2) => ( 


VV> = VV2(r1,12) = ( 


As we know from Calculus, the gradient VV; points out in what direction we should move 
from the point (7,72) for the function V;(r1,r2) to change fastest. In particular, if we move 
in a direction such that the projection of VV; on this direction is positive, then V;(r1,r2) 
increases. 

If the maximum is attained at a point r = (71,72), and this point is an interior point of the 
square R, then 


VO =a VV +a2.VV2 = 0. (2.1.13) 

If the point r does not correspond to an end-point of the Pareto optimal set, both a;’s are 
positive, and VV; = — (a2 /a1)VV2. Thus, we have come to the condition 

VV, = —kVV> for some k > 0. (2.1.14) 


Condition (2.1.14) means that if an interior point r is Pareto optimal, the vectors VV (71,72) 
and VV2(r1,r2) have opposite directions; see Fig.12a. 

We could also come to the condition (2.1.14) reasoning in the following way. Assume 
that (r1,r2) is an interior point, VV; , VV» are non-zero vectors, and the condition (2.1.14) is 


574 12. RISK EXCHANGE: REINSURANCE AND COINSURANCE 


not true. Then—see also Fig.12b—we could move in the direction VV; + VV2 (the sum of 
the gradient vectors), and both values, Vi (11,72) and V2(r1,r2), would have increased. This 
would have meant that (71,72) is not a Pareto optimal solution. Note that in the reasoning 
above, we did not use the convexity of R. m 
Two more remarks. First, VVj (71,72) and VV2(r1,r2) do not have to be tangent to R. 
Second, since (2.1.14) deals only with first derivatives, it is not a sufficient condition, and 
it concerns only interior points of R. 


Next, we consider several particular problems. 


2.2 Two more examples with expected utility maximization 


Consider n participants with utility functions u; (x),..., un(x). Assume that all functions 
u;(x) are sufficiently smooth, u(x) > 0 and u(x) < 0 for all x’s. So, the participants are 
risk averters. 

In accordance with the scheme of the previous section, to find the Pareto optimal solu- 
tions we should try to maximize the function 


n 
Z) = Ya {uj(Z)} 
i=l 
subject to the condition 
n 
eA) Y. (2.2.1) 


The following proposition belongs to K. Borch; see [16] and references therein. Some 
refinements and generalizations are obtained by H. Gerber in [42]. 


Proposition 3 Let Z = (Z),...,Z,) satisfy (2.2.1), and 
aui (Zi) =... = anul, (Zn). C22) 
Then, for any Z = (Z1, ...Zn) satisfying (2.2.1), 
G(Z) < G(Z). (2.2.3) 


Proof. Denote by k the number a;u;( Zi) which, in accordance with (2.2.2), does not 
depend on i. Since the functions u;(x) are concave and smooth, for any x and x 


u(x) < uj(X) +uj(x)(x—X). 
(See Section 0.4.3.) Then 
ajuj(Zi) < ajuj(Z;) + aiu! (Zi) (Zi — Zi) = ajuj(Z;) + k(Z; — Zi). 


Consequently, 


2. Risk Exchange and Reciprocity of Companies 575 


because Z and Z both satisfy (2.2.1). Taking expectations we come to (2.2.3). Hi 


EXAMPLE 1 is practically the same as the corresponding example in [42]. Let u;(x) = 
—e 8", In accordance with (2.2.2), 


aju' (Zi) =k, 
where k is a number not depending on i. From this it follows that a;B;exp{—B;Z;} = k, and 
~ 1 
Zi = =C +b;i, (2.2.4) 
Bi 
where C = —Ink, bi = g; InaiBi. Since a; is an arbitrary positive number, b; is arbitrary 


number, positive or negative, for all i. 
Recall that )"_, Z; = S, where S = )°"_, Y;. Summing up the left and the right sides of 
equation (2.2.4), we have 


8=C¥-(1/B:)+¥ bs 


Solving it with respect to C and inserting into (2.2.4), we arrive at 


Zi = iS +ci, (2.2.5) 
where 
1/B; 
% ==? (2.2.6) 
Lie (1/B;) 
Ci = bi — Vi Ds bi. (2.2.7) 


i=1 
n n n 


Clearly y; > 0, and ¥°"_, Y; = 1. Now, Sa = oe EWE bi) = bi — 99 bi) =0. 
i i=l i=] 


i=] i=l i=l i=l 
Thus, 


Fesi: (2.2.8) 
i=1 


Moreover, since b; are arbitrary numbers, c; are arbitrary numbers for which (2.2.8) is true. 
Indeed, if we choose arbitrary b;’s for which )°"_, b; = 0, by virtue of (2.2.7), ci = bj. 

Thus, we can forget about b;’s, and claim that the set of all Pareto optimal solutions is 
described by (2.2.5) where 


e y; is a certain non-random share of the total income S, which the ith participant 
will have. The share y; is completely specified by parameters B; in accordance with 
(2.2.6). 


e crs are arbitrary numbers for which (2.2.8) holds. 


Let, for example, n = 2, B; = B. Then, as is easy to check, all Pareto optimal solutions 
may be represented as follows: 


~ 1 ~ 1 
Z= ates LZ. = ee 


where c is an arbitrary constant. 


576 12. RISK EXCHANGE: REINSURANCE AND COINSURANCE 


To choose one Pareto optimal solution, the participants should choose a number c. Be- 
fore the exchange, the participants may be not in equal positions because Yı, Y) may have 
different distributions. Since in this case the participants share the total random income in 
equal proportions, the payment c is the only way to compensate for the inequality men- 
tioned. For example, the participants may come to the agreement that, on the exchange, 
they will not make profit on the average. This amounts to the requirement E{Z;} = E {Y1}, 
E{Z} = E{Y>}. Then the participants should choose 


c= 5 (EM) -E 


Consider now the model of proportional mutual reinsurance as we built it in Examples 
2.1-4 and 6, keeping the same notation. Thus, the quality-index-functions of two companies 
are 


Vi (r) =Vi(ri,r2) = E{u (cri — X11) }, V2 = V(r) = E {ua (cr2 — Xx) }, 


where 
Xr =11X1 + (1 — 12) X2, Xr = (1 — r1 )Xı +r2X2. 


Let us assume that the part of a risk to be reinsured comes with the corresponding part 
of the premium. More precisely it means that, when ceding the part (1 — r;)Xj, the first 
company yields also the premium (1 — r1 )cı, and the same concerns the second company. 
So, the companies share their portfolios rather than their risks. Thus, 


Cry = r1C1 + (1 — r2)c2, Cy = (1 — ri)ci +172C). (2.2.9) 


The next example shows what we can expect in this case. 


EXAMPLE 2. Let X1, X2 be independent and have the I-distributions with parameters 
(a1, V1) and (a2, V2), respectively. Set m; = E{X;} = 1/a;. Assume that 


ui (x) = m(x) = —e*, 


We consider only the case 0 < B < min(m;',m;') because, as we will see, otherwise 
Vi (r), V2(r) would not exist. We have 


Vi(ri,r2) = —E{exp{—Bler — 1X1 — (1 — r2)X2]}} 
= —exp{—Ber JE {exp{B[riX1 + (1 — r2)X2]}} 


= —exp{—Ber SE {exp{BriX1 f FE {exp{B(1 — r2)X2}} 
1 1 
= exp{ Beri} (i = Bryn)! : (i = EN = ry)? . 
(For the m.g.f. of the I’-distribution see Section 0.4.3.5.) 
Similarly, 


1 1 
BEE =O) GaGa C= prone 


2. Risk Exchange and Reciprocity of Companies 577 


Note now that to find Pareto optimal points, one does not need to know concrete values of 
the functions under consideration, but rather the areas where these functions are increasing 
or decreasing. Consequently, if instead of the functions Vj (r1,r2), V2(r1,r2) we consider 
the quality indices g(Vi(r1,r2)), g(V2(r1,r2)), where g(x) is a strictly increasing function, 
then the set of Pareto optimal solutions will be the same. So, we may simplify the problem, 
taking g(x) = In(—1/x), and instead of V1, V2, considering the functions 


Qi(r1,72) = g(Vilri,r2)) = In [exp{Ber1 }(1 — Bri )”! (1 — Bm (1 — r2))”?] 

= Ber: + Vi In(1 — Bmiri) + v2In(1 — Bm2(1 —12)), (2.2.10) 
Qo(r1,r2) = g(V2(r1,r2)) = In [exp{Ber2}(1 — Bm (1 — r1)) (1 — Brngr2)”?] 

= Ber + Vi In(1 — Bm (1 — r1)) + V2 In(1 — Barz). (2.2.11) 


Taking into account (2.2.9), after simple algebra, we have 


VQ (rir) = (Be vı pri Boo +v2 Brn ) ; 


1—Bmyr,’ 1 —Bm(1 —1r2) 
Manas (te ial a =r)’ a i) l 


As we showed in Section 2.1, for (71,r2) to be a Pareto optimal point, the vectors 
VQ1,VQ2 should have opposite directions. In this case the ratio of the coordinates for 
both vectors should be the same, which leads to the equation 


v 2 v Ze 
a i 1- Bmyr; 2 f 1-— Bmore 
my 


= (zarva) tipi) CD 


(The 1.-h.s. is the product of the first coordinate of the first vector and the second coordinate 
of the second vector, and the r.-h.s. is the product of the second coordinate of the first vector 
and the first coordinate of the second vector. We divided both sides by B°.) 

This is an equation for (71,72). To realize how the corresponding set can look, consider 
the case 


mı = m, Bm; =l,and cj= (1+ 0)m;. 


Then m’s will cancel, and (2.2.12) may be written as 


(a+) =) (o +0) =) = (o +6) 1) (o +0) 2), (2.2.13) 


First, let vı = v2 = V. Setting in (2.2.13) r2 = 1 — rı, we 2 
come to an identity. Hence, the set 


r+rn=1,0<7<1,0<7<l1, (2.2.14) 


is one of the solutions of (2.2.13); see Fig.13. We skip cal- 
culations showing that other solutions, if any, do not fit the FIGURE 13. 
conditions of this problem. 


578 12. RISK EXCHANGE: REINSURANCE AND COINSURANCE 


If vı Æ Vo, the problem becomes more interesting. We depict the plots of the set (2.2.13), 
made by Maple, in Fig.14a for 0 = 0.1, vj = 1, v2 = 5, and in Fig.14b for O = 0.4, vı =3, 
v2 = 10. In Exercise 11, we explain why in the last case the solution is close to (2.2.14). 


FIGURE 14. 


In conclusion, note that since we used the “first-derivative-analysis’, rigorously speak- 
ing, we still did not prove that the sets above are Pareto optimal. (This is similar to one- 
dimensional optimization problems: the fact that the first derivative equals zero at a point 
does not imply that this point is a maximizer or minimizer.) To complete the proof we 
should consider either the second derivatives of the functions (2.2.10)-(2.2.11), or the be- 
havior of the gradients in a neighborhood of the sets above. Calculations are elementary, 
although they are lengthy, so we skip them also. 


2.3. The case of the mean-variance criterion 
In this section we consider the case of proportional reinsurance described in Examples 
2.1-2, 4, 6. 


Assume X1,X> to be independent, and set m; = E{X;}, 07 = Var{X;}, i= 1,2. Suppose 
©; > 0. 


We start with a relatively simple problem which corresponds to the case T = 0 in (2.1.6). 


2.3.1 Minimization of variances 


Here, we assume that the companies try to minimize the variances of their future profit. 


Such a setup may be reasonable, for example, if the companies agree not to make profit 
on the exchange, at least, on the average, and under this condition, the companies try to 
reduce the riskiness of their portfolios. 


(Not making profit on the average, certainly, amounts to the condition E{Z,;} = E{Y;} 
for each r and i = 1,2. In our case, this is true if when ceding the risk (1 — r;)X;, the ith 
company pays the net premium (1 — r;)m;, that is, the expected future payment for the risk 
ceded. 


2. Risk Exchange and Reciprocity of Companies 579 


Indeed, for the first company this ends up with the premium cy; = c;—(1—r; )m,+(1—r2)m2 
(since the second company will pay the net premium (1 — r2)mz to the first). Then E{Y,;} = 
Cri — E{X1} =c (1 rı)mı } (1 r2)m2 rym, (1 r)m =c -mı = E{Y,}. The 
same is true for the second company.) 

Thus, the companies try to minimize the values of the functions 


Vi (11,12) = Var{Z,;} = Var{X;;}. 


Formally, if we set t = 0 in (2.1.6), we will come to V(Y) = —Var{Y}. In this case, it is 
convenient to omit the minus, and consider the corresponding minimization problem. Cer- 
tainly, the problems of maximization —Var{Y } and minimization of Var{Y } are equivalent. 
Below, we will use the symbol V;(r1,r2) for the corresponding variance. 

Since X,,X> are independent and X,; = rıXı + (1 — r2)X2, Xv = (1 — r1 )Xı + r2X2, we 
have 


Vi (71,72) = ror (1 —17)°65, V2(r1,12) = (1 = r1) 07 +1305. (2.3.1) 


To find Pareto optimal points, we again apply the method suggested in Section 2.1. In 
our case, it consists in minimization of the linear combination a, Vj (r1, r2) +a2V2(r1,r2) for 
positive a;,a2. So, we deal with a simple quadratic function which is concave upward and 
has a unique minimum; see also Exercise 3. For this reason, we restrict ourselves below to 
the first-derivative-analysis, looking for points where the gradients VV; , VV2 have opposite 
directions. 

The same concerns the problem of portfolio redistribution we will consider in Section 
23:2: 

For the gradients we have 


VV) =2 (107, —(1 = r2)02) ; VV2 =2 (—(1 = rı)o?, r03) ; 


If (2.1.14) is true, the ratio of the coordinates of VV; and VV; is the same, which amounts to 


ri _ 1- ri 
1— r2 E r2 ` 
This is equivalent to 
ri+n= l; (2.3.2) 
see Fig.13. 
Set rı =r. Then in the line (2.3.2), r2 = 1 — r, and 
Vi(r,1 =r) = rjo? +05], (2.3.3) 
V(r, 1 =r) = (1 — r} [o] +05]. (2.3.4) 


For the first company, the smaller the r, the better. At the status-quo point, that is, at 
r) = (1,1) the value V; (1,1) = 62. Hence, the first company would accept reinsurance 


only if Vi(7,1—r) < Or. In view of (2.3.3), this corresponds to r < 6; Jy /o} + o2. 


580 12. RISK EXCHANGE: REINSURANCE AND COINSURANCE 


'p 


7 te a 


Oo O; 
"(62 2) 1/2 2 2) 1/2 
(65 +05) (Of + 05) 


(a) (b) 
FIGURE 15. 


Similarly, the second company would accept reinsurance if 1 —r < 02 i, 4 [07 +05. 
Thus, the companies should choose the retention coefficients, from the segment 


02 O1 


rn+n=l, 1- ——=—— <r < —. 
1/07 +03 1/07 +03 


The reader is encouraged to show that the left member of (2.3.5) is not greater than the right 
member. Fig.15a depicts the set (2.3.5), Fig.15b - the corresponding set of Pareto optimal 
points in the plane (V1, V2). Since now we minimize the values of V; and V2, the Pareto 
optimal points correspond to the “South-West” boundary in Fig.15b. 


(2.3.5) 


EXAMPLE 1. (a) Let 6; = 3, 62 = 4. Then (2.3.5) is equivalent to 
riı+r2=1, 02<r, <0.6; (2.3.6) 


(b) However, if 0; = 10, 62 = 24, then the set of Pareto optimal solutions is 


5 


<r < — or 
SNS 73> (2.3.7) 


n+r=1, B 


and does not include the middle point (5, 5). 


Once the set of Pareto optimal solutions is determined, the companies should establish 
an additional principle leading to the choice of one solution from the set mentioned. Such 
a principle may require the volume of exchange to be, in some way or another, balanced. 

Assume, for example, that the cost of reinsurance is proportional to the mean value of 
the reinsured risk. More precisely, for the risk (1 — r1 )X; to be reinsured, the first company 
should pay to the second company the amount (1 + Oreins)(1 — rı )mı, where Oreins is a 
reinsurance loading coefficient. Respectively, the second company pays to the first the 
amount (1 + reins )(1 — r2)m2. For the exchange to be balanced, we may require these two 
amounts to be equal (which is equivalent to the condition that the companies do not pay 
each other). Since in this case the factor 1 + 0,cin, cancels, we have 


(1 => rim, = (1 = r2)m2. (2.3.8) 


2. Risk Exchange and Reciprocity of Companies 581 


Together with the condition rı + rz = 1, we obtain a simple solution 


mı m2 
ry = ———, n= À 2.3.9 
l mı +m á mı +m ( ) 


It is worth emphasizing two things. 

First, we did not establish rule (2.3.8) in the very beginning. Only at the very end, when 
the set of Pareto optimal points has been determined, we impose condition (2.3.8) as an 
additional requirement. 

Secondly, rule (2.3.8) may not lead to an appropriate solution. 


EXAMPLE 2. Let mı = 4, m = 6. Then rı = 0.4, r2 = 0.6. In the situation of Example 
la, the point (0.4,0.6) belongs to the set of Pareto optimal solutions (2.3.6), so we can 
choose this point as a particular solution. 

However, in the situation of Example 1b, the point (0.4,0.6) does not belong to the set 
(2.3.7), so in this case, (2.3.8) cannot serve as an additional condition. 


The last example points out a disadvantage of the expected value premium principle that 
does not take into account the risk to be carried by the reinsurer. 

Suppose now that, when ceding the risk (1 — r;)X;, the ith company pays the net premium 
(1 — ri)m; as we considered above, plus an additional security premium for reinsurance 
which is proportional to the standard deviation of the risk reinsured. More precisely, this 
additional payment is equal to Areins(1 — r;)6;, where Areins is a loading coefficient. Such 
a rule corresponds to the standard deviation premium principle we considered in Section 
10.4; see also (10.4.2). 

Assume now that the companies agree that these two additional premiums will compen- 
sate each other, so as a matter of fact, no company pays an additional reinsurance premium. 
This is equivalent to the condition 


(1 = r1)O1 = (1 = r2)02, 


which together with rı +r2 = 1, leads to 


Oo! 02 
r= t= i 2.3.10 
: O1 +02 j O1 +02 t ) 


This solution looks quite natural. Note that, unlike the case of (2.3.9), the solution (2.3.10) 
always (!) belongs to the Pareto optimal set (2.3.5). We show this in Exercise 13. For 
particular examples, consider Exercise 14. 


The end of Route 2 


2.3.2 The exchange of portfolios 


Let the ith portfolio be specified by the premium c; and the future claim X;. Now we 
assume that the part of a risk to be reinsured comes with the corresponding part of the 
premium. In other words, as in Section 2.2, we accept rule (2.2.9). 


582 12. RISK EXCHANGE: REINSURANCE AND COINSURANCE 


In this case, the profit of the first company is the r.v. Z-) = Cr —Xrı = 1101 + (1 — 12) c2 — 
rıXı + (1 —12)X2. =r (cı Xı) t (1 r2) (c2 X2). Thus, 


Zr =r1Yi+(1—1m)%. (2.3.11) 
Similarly, the profit after reinsurance of the second company is 
Zy2 = (1=ri)Yi +r2¥2. (2.3.12) 


Now we adopt the mean-variance criterion (2.1.6), assuming for simplicity that the tolerance- 
to-risk coefficient T is the same for both companies. Suppose that X; and X2, and hence Y; 
and Y>, are independent. Then, 


E{Y;} = si = ci — mi, and Var{Y;} = 07. 


By (2.3.11), (2.3.12), 


Vi (71,72) =tE{Zy1 }—Var{ Zr }=t[risi +1 r2)s2] (rio; +(1 r)°65], 
V2(11, 12) =tE {Zy2} —Var{Zy2}=1((1 r1)S1 +1259] re r1) o7 } 1505]. (2.3.13) 


Certainly, the problem includes that of the previous section. If t = 0, the functions V; 
do not involve premiums, so any premium redistribution may be assumed, including the 
redistribution we considered in Section 2.3.1. 

As in Section 2.3.1, we restrict ourselves to the first-derivative-analysis, that is, we will 
specify the set where VV; and VV2 have opposite directions. See the remark directly fol- 
lowing (2.3.1). 

Differentiating V;’s, we get the gradients 


VW), = (tsı —2r107, —TS2 +2(1 = r2)03) , 
VV» = (—tsı +2(1 —r1)o7, TS — 27205) ‘ 


We will see that it is convenient to use the characteristics 


T ae 
§:=—, & =, (2.3.14) 
oi 02 


and represent the gradients by the formulas 
VV; = (0181 — 2r1], 03[2 — 82 — 2r2]) , VV2 = (07 [2 — 8 — 2r1], 03 [82 — 2r2]) . (2.3.15) 
Let first, 6} = 62 = 1. Then 


VV) = (oill — 2ri], 03/1 — 2r9]) ; VV> = (oill — 2ri], 63/1 — 2r9]) š 


Thus, in this case, VV; = VV2, and VV; = VV2 = 0 at r = (r1, r2) = (4, 5). Hence, both 


functions, Vj and V2, attain their maximum values at the same point r = (4, 5 : 


Consequently, in this case the set of Pareto optimal solutions consists of one (!) point 


(4, 5) and for both companies the best solution is to divide the risks into equal shares. This 


is an extreme case. 


2. Risk Exchange and Reciprocity of Companies 583 


Assume now that at least one 6 is not equal to one. If (2.1.14) is true, the ratio of the 
coordinates of VV; and VV should be the same, which implies that 


TS] —2r,o7 o 128] +2(1 = r\)O7 
—T S82 + 2(1 — r2)03 T282 — 2r203 ` 


(2.3.16) 
Straightforward algebraic manipulation shows that (2.3.16) is equivalent to the equation 


(1-82) +7181) = 1- 5(81 +). (2.3.17) 


Thus, the set of Pareto-optimal solutions is a subset of the line (2.3.17). Let us now 
consider some particular cases. 


2.3.2.1 The symmetric case. Let 6; = 52 = Ò Æ 1. Itis not difficult to verify that in this 
case 6 in (2.3.17) cancels, and we again come to the segment 


ntn=1,0<1<1,0<n<1. (2.3.18) 


By (2.3.14), in our case, 
Ts; = 80}, Ts2 = 805. (2.3.19) 


Let rı =r. Then, in accordance with (2.3.13), (2.3.15), and (2.3.19), in the segment 
(2.3.18), 


Vi(r,1—r) = trisi +52] — rlo} +03] = (of +05) [8r — 7°], (2.3.20) 
Va(r,1—r) = t(1 —7)[s1 +52] — (1 - r} [of +05] 

= (01 +.03)[8(1 —r) — (1—r)’], (2.3.21) 

Wi (7,1 —r) = (07 [8 — 2r], 65[-8 + 2r]) = (6 — 2r) - (01, —05) , (2.3.22) 


WVo(r, 1 —r) = (07 [2 — 8 — 2r], 05[5—2+2r]) =(2r+5—2)- (—of, 03). (2.3.23) 


Solutions we choose from (2.3.18) should satisfy the following two conditions. 
(1) VVi (7,1 — r) and VV2(r,1 — r) must have opposite directions. 
(2) The positions of both companies should be not worse than they were at the status-quo 
point, namely, 


Vi(r,1—r) > ts; —07 = 07(8— 1), V2(r,1 —r) > ts: —03 =05(5—-1). (2.3.24) 


In view of (2.3.22) and (2.3.23), the first condition holds if (6—2r) and (2r +ô — 2), as 
functions of r, have the same sign. It is not difficult to figure out (say, graphing these linear 
functions) that forO<r<1 


(i) if ò > 2, this is always true; 


(ii) if 1 <8 < 2, this is true for 1 — È < r < Š; 
(iii) if 0 < 8 < 1, this is true for È < r < 1 — È. 


Note that, if 6 = 1, we have only one point r = 5, which we already observed above. 


1 
2? 


584 12. RISK EXCHANGE: REINSURANCE AND COINSURANCE 


In view of (2.3.20) and (2.3.21), condition (2.3.24) is equivalent to the quadratic inequal- 
ities 


(of +63) [Sr — r] > 03 (5-1), (0 +03) [S0 —r) — (1 —r)"] > 03(8—-1). (2.3.25) 


EXAMPLE 1. (a) Let 6) = 1, 62 = 2, 6 = 2. These numbers determine all the necessary 
information about s1, 52,7. Since 6 = 2, the condition (i) holds automatically, and we 
should solve (2.3.25) which is equivalent to 


52r-r] > 1, 5321 -r)-(1-r)?] > 4. 


The solution to these equations is 


so approximately we deal with the segment 


{0.105 < ri < 0.447, r = 1 =r}. 


(b) Let 6; = 1, ©2 = 2,6 = 1.5. Then we should solve the equations 


5[1.5r- r] > 0.5, 5[1.5(1 —r) — (1 —7)*] > 2. 


The approximate solution here is 0.0699 < r < 0.6531. However, we should now take into 
account the requirement | — 8 <r< ò which in our case is 0.25 < r < 0.75. Hence, the 
eventual answer is 

{0.25 < ri < 0.6531, r = 1- ri}. 


The choice of one point requires an additional agreement similar to what we discussed in 
Section 2.3.1. 


2.3.2.2 A typical situation in the general case. Consider now the case when 6; Æ 6». 
The main point here is that, by virtue of (2.3.17), Pareto optimal solutions lie in a line 
distinct from the diagonal of the square R. Note also that, in view of (2.3.14) we can 
rewrite (2.3.13) as 


Vi (ri, r2) = r18101 + (1 — r2)8203 — [r101 + (1 — r2)°03], (2.3.26) 
Vo(ri,72) = (1 —11)8107 +r28203 — [(1 — r1)07 + r203]. (2.3.27) 


General formulas here are a bit cumbersome, so we restrict 
ourselves to an example. 


EXAMPLE 2. Let 6; = 5,2 = 9. Substituting it into 
(2.3.17), we come to the line 


1/4 0.3 3/4 1 
FIGURE 16. 


4r, +2n =3. (2.3.28) 


2. Risk Exchange and Reciprocity of Companies 585 


So, we should consider the segment connecting the points (j,1) and (},0); see Fig.16. 
Note that border points from the segments {r2 =1,0<r) < i} and {r2 =0, 2 <r}, 
see again Fig.16, are also Pareto optimal. (The condition VV; = —kVV) applies only to inte- 
rior points; the border or a part of it may be Pareto optimal in the absence of this condition.) 
To show this, we should consider the gradients along these segments. However, we will not 
do that since, as we will see, these segments should be excluded from consideration. 
Consider the third, central, segment. Set again rı =r. Now r € [4, 3]. In accordance 


with (2.3.15), substituting r2 from (2.3.28), we have 


WV; = (64[5 — 2r1], 05[2 — 9 — 2r2]) = (07 [5 — 2r], 63[-10 + 8r]) 
= (5 = 2r) f (of, —203) , 
VV = (64[2 —5 — 2r1], 63[9 — 2r2]) = (2 + 3r) - (—07, 205). 
We see that VV, VV2 have opposite directions for all r € l$, 3]. 


Suppose now that ©; = 1, 62 = 2. Then, from (2.3.26), (2.3.27) it follows that in the line 
(2.3.28) 


Vi (ri r2) = 5r, +36(1 — r2) — [r] +4(1 — r2)7] 
=5r+18(4r— 1) — | + (4r —1)] = -19+ 85r — 177’, 
Va(ri;r2) = 5(1—r1)+36r7-— [(1 ri)” +445] 
= 5(1—r)+18(3 — 4r) — [(1 —- r)? + (3 — 4r)?] 
= 49—51r—17r’. 


At the status-quo point, Vj (1,1) = 8,07 — 0} =4, V2(1, 1) =8205 — 05 = 32. Consequently, 
the restrictions Vi (r1,r2) > Vi(1, 1), Vo(ri,r2) > V2(1, 1) are equivalent to the equations 


—19+85r— 177° > 4, 49— 51r — 17r° > 32. 


The solution to the system of these quadratic inequalities is 


5 3 1 3 
Sey AY op N es 
2 34 629 STS5 i 2 


Thus, the set of acceptable solutions for both companies is given approximately by 
0.2871 < rı < 0.3027, 4r +2r2 =3 


(again see Fig.16). The solution is close to the point (0.3,0.9), so in this case, additional 
considerations are not necessary. 


Other proofs, comments, and models of this type, and references may be found in [11, 
Chapter 5], [16], [31], [101]. 


586 12. RISK EXCHANGE: REINSURANCE AND COINSURANCE 


3 REINSURANCE MARKET 
3.1 A model of the exchange market of random assets 


While in the previous sections we discussed direct risk exchange as a result of bargaining 
between companies, this section concerns the decentralized price mechanism of exchange. 
In this case, the participants of the market do not exchange their risks directly but trade 
them. 

Constructions of this section serve merely to illustrate some basic ideas and concepts, 
and results below should not be, certainly, considered practical recommendations. Modern 
markets are complex and sophisticated mechanisms that use various financial tools, and a 
pure exchange model may serve only as a first approximation. 

The classical model we discuss next is due to K. Arrow and H. Biihlmann (see, in partic- 
ular, [5], [6], [21], [22]). 

Consider the general framework of Section 2.1. Assume that the participants of the 
market are expected utility maximizers, and the ith participant has a utility function u;(x) 
and an initial wealth s;. Then for an additional income y (positive or negative) the utility of 
the ith participant’s wealth is equal to uj(s;+y). Set uj(y) = ui(si+y). The function u;(y) 
is the utility function which incorporates the initial wealth. 

As usual, we fix a sample space Q = {@}and a probability measure P(A) defined on 
events A from Q. All r.v.’s under consideration are functions on Q. 

To simplify calculations, we assume below that the space Q is discrete: Q = {@ 1, @z,...}, 
though, as a matter of fact, the results at which we will arrive are true in a much more 
general situation. 

If we denote by P(@,;) the probability of the outcome œx, then for a r.v. Y = Y (œ), 


E{Y} =} Y (o) P(@ ). (3.1.1) 
k 


As in Section 2.1, we fix the vector Y = (1, ...,Y,) of the original r.v.’s of future incomes. 
Thus, the future wealth of the ith participant is s; + Y;, and its utility is u;(s; + Y;) = u;(¥). 
The r.v. Y; may be called the initial endowment (without the fixed initial wealth s;) of the 
participant i. 

The participants of the market exchange the r.v.’s of their future incomes. Thus, the com- 
modities of the market are random variables, and the main concept we should define and 
clarify is the price of a r.v. We assume that it is represented by a numerical-valued function 
G(Y) assuming numerical values and defined on the class of r.v.’s under consideration. 

We do not exclude r.v.’s taking negative values, interpreting it as the case of losses, and 
we do not exclude the case when the price assumes a negative value. If an investor accepts 
a risk which may bring only losses, then the investor should be paid for this. This means 
that the price of this risk is negative. 

Unlike the price for a usual commodity, what should be called a price for a random 
income (say, a random asset) is a non-trivial question. 


3. Reinsurance Market 587 


We suppose that the price function G(Y ) is linear, that is, for any two r.v.’s Yı and Y2, and 
numbers ©, and Q 
G(aY; + O2¥>) = oO G(Y;) +02G(¥2). (3.1.2) 


This condition looks reasonable although it involves some simplification. As a matter 
of fact, prices are not linear: usually the more you buy the less you pay for a unit of 
commodity. 

We assume also that 


G(Y) > 0, for any non-negative r.v. Y, and (3.1.3) 
G(1) = 1; more precisely, G(Y) = 1, for any rv. Y = 1. (3.1.4) 


Such a price function G may be represented as follows. Let Q(A) be a probability mea- 
sure defined on events A from Q. This measure may be different from the original “actual” 
probability measure P(A). Denote by Eg{Y} the expected value of Y with respect to mea- 
sure Q. If Q is discrete, and Q(@,) is the probability of @, with respect to measure Q, 
then 


Eo{Y} = YY (@)O(@x) (3.1.5) 
k 


(compare with (3.1.1)). 
Consider the function 
G(Y) =Go(Y) = Eo{Y}. (3.1.6) 


This function satisfies properties (3.1.2)-(3.1.4) as any expectation with respect to some 
probability measure. 

Moreover, it may be shown that under some mild conditions, any function G(Y ) with the 
properties (3.1.2)-(3.1.4) is equal to Gg for some Q. In particular, it is true if the sample 
space Q is finite. 

So, we adopt, as a definition of price, the representation (3.1.6). In this case, the deter- 
mination of a pricing procedure in the market amounts to the determination of a measure Q. 
If Q is discrete, in view of (3.1.5), to determine Q, it suffices to specify the “probabilities” 
Q(x). 

It makes sense to emphasize again that the numbers Q(@,) differ from the actual proba- 
bilities P(@,). One may say that Q(@,;) reflects how the participants of the market estimate 
the possibility of the outcome œx or, in other words, Q(@,;) indicates the participants’ be- 
liefs regarding œp. These beliefs may differ from reality. 


Given a pricing function G(Y), the participant i sells the endowment Y; for the price G(Y;) 
and purchases another r.v. W; for the price G(W;). This is equivalent to the exchange Y; for 
W; for the price G(W;) — G(Y;). Note that G(W;) — G(Y;) = G(W; — Y;), since the function 
G(-) is linear. 

We do not assume that G(W;) < G(Y;), and the value G(W; — Y;) may have any sign. If 
G(W;) > G(¥;), we interpret it as if the participant borrowed some money to purchase W;, 
or took this money from the initial wealth. On the other hand, G(W; — Y;) may be negative, 
which means that after exchange, the participant received additional cash. 

Eventually, the future income of the ith participant is 


Zi = W;— G(W;—Y;), (3.1.7) 


588 12. RISK EXCHANGE: REINSURANCE AND COINSURANCE 
and the expected utility is 
Vi = E{u(Zi)} = E{u(Wi— G(W; — ¥;))}. (3.1.8) 


Certainly, the exchange should be balanced, that is, 


=r. (3.1.9) 


Because the price function is linear, from (3.1.9) it follows that X; G(W;) = L_, G(%), 
or 


X G(W;-Y;) =0. (3.1.10) 


That is, the total payments are also balanced. Say, if n = 2, one participant pays, and the 
other receives. 
From (3.1.10) it immediately follows that (3.1.9) is equivalent to the balance for the Z’s: 


Ms 


Z=% Y, (3.1.11) 
i=1 i=1 


Indeed, by virtue of (3.1.7), Xi- Zi = Vi, Wi — VL, G(W; — Y;), while the last sum equals 
zero by (3.1.10). 


Given a pricing function G, the ith participant determines the r.v. W; which maximizes 
the expected utility (3.1.8). Denote the result of this maximization by Wig. The r.v. Wig is 
the demand of the participant i. 

Note that Wig is not what the participant will buy. Rather it is what she/he wants to buy; 
this is a demand. The point is that if a pricing procedure does not closely reflect the real 
situation in the market, demand may be not equal to supply. Thus, we should not expect 
that for an arbitrary G the demand quantities Wig satisfy the balance requirement (3.1.9). 


We call a function G*(Y) and a vector W* = (W;',...,W,’) an equilibrium price function 
and an equilibrium demand vector, respectively, if 


(a) W,* = Wig: for alli = 1,...,n, that is, for each participant, the demand is optimal with 
respect to the pricing function G*, 


(b) the supply and demand are balanced, i.e., 
n n 
yw =). 
i=l i=l 
As was noted, the last property is equivalent to the balance equation 
n n 
ya=y%, 
i=l i=l 


where Z; = W;* — G*(W,* —Y;). So we can talk about the equilibrium vector Z* = (Zj,...,Z7). 


sey Ln 


3. Reinsurance Market 589 


It may be proved that, under some mild conditions, the solution Z* exists and is Pareto 
optimal; see, e.g., [81], [86]. This is a remarkable fact. First, this means that under equi- 
librium prices, the market pricing mechanism of exchange leads to a solution which cannot 
be improved simultaneously in favor of all participants. Second, the pricing mechanism 
is decentralized. To reach the equilibrium solution, the participants do not have to enter 
into direct negotiations. Each participant can make decisions separately from the others, 
maximizing just her/his own expected utility of the future income. 


3.2 An example concerning reinsurance 


The classical example we consider here is due to K. Borch; see, e.g., [16]. 

Suppose the participants of the market are insurance companies and Y; = —X;, where X; 
is the original claim for the ith company. We interpret the quantity s; from the general 
scheme above as the amount of the fund available to cover the claim X;. In particular, we 
assume that s; involves the premium paid. Then s; +Y; = s; — X; is the future surplus of the 
ith company. 

Denote by X; the claim retained by the ith company. (In the notation of the previous 
section, W; = —X;.) This means that the company purchases the reinsurance for X; — X; and 
pays for this the price 

di = G(X; —X;), (3.2.12) 


where G(-) is a pricing function. Note that the r.v. X; — X; may take on negative values (the 
company reinsures somebody’s risk). Similarly, G(X; — Xi) may be negative (overall, the 
company behaves as a reinsurer). 

The expected utility of the ith company is 


= E{ii(s; —X;—d))}. (3.2.13) 
The exchange is balanced if 
n n 
eA wee (3.2.14) 
i=l i=l 


Assume that the pricing function G is generated by a probability measure Q in the fashion 
of the previous section. Then, in accordance with (3.1.5), 


di = Eo{X;—X} = Y[Xi(@x) —X (x) ]O(@x) = $ (x — Fix) ae, 
k k 
where xir = X; (Ok), Xik = X; (Ox), and qx = Q(M). 
Setting p = P(@,), and tig = xix — Xiz, We rewrite (3.2.13) as 


V; =} pi (51 -Elow sia (3.2.15) 
l k 
(Certainly, the running indices in the exterior and interior sums should be denoted by dif- 
ferent symbols.) 
Our goal is to find an equilibrium price G and an equilibrium demand. Since G is com- 
pletely characterized by the measure Q, our objects of study are the numbers qg and Xjx. 


590 12. RISK EXCHANGE: REINSURANCE AND COINSURANCE 


First, considering just one company, we fix i and maximize (3.2.15) with respect to the 
vector (Xj1,j2,-..). For a fixed m, the partial derivative with respect to Xj is 


oV; ra) 5 aes xy 3) a 3 ac 
= = aS pili (s — Xil — Y (xx -ia + ae Pm" (s — Xim — De -wa 


k k 


=) pù, (1-8-Feu-wa]) m+ Pmi; (sm Fa-soa) (—1+qm) 
k k 


l~Am 


= qm} pili, (s =i — oie ia) — Pmi, (s — Kim — } (xi -7a 
7 


k 
= QmE {ia (si =X —d;) \ — Pmi, (si — Xim — di) : 


Setting all partial derivatives equal to zero, we have a system of equations for the equi- 
librium prices and equilibrium demand vectors; namely 


amE {i (si-X;-di) } =pnit (si Fini) for alli=1,....m,andm=1,...,n. (3.2.16) 


EXAMPLE 1. Let 
e 1, 
m(x) = E +x. 
The function above is increasing for x < 1, so we should assume all r.v.’s s;— X;, and 
Si — Xi — di to be bounded by one. 


Suppose also, for the sake of simplicity, that all s; = 1. 
Since in our case u(x) = 1 — x, and s; = 1, equations (3.2.16) may be written as 


am (1-E { (1-X;-4i) }) = Pn (1 (1 -3m =), 


or 
gm(E{Xi} + di) = Pim (im + di). (3.2.17) 
Note that, in view of (3.2.14), 
$} ai=0 (3.2.18) 
i=1 
(see also (3.1.10)). Set 
S = Xi +... +Xn, 


n n 
and observe that, by virtue of (3.2.14), Ł Xim = Y Xim = Sm, Where sm is the value S = S (œ) 
i=l i=l 
at © = Wm, that is, Sm = S(O©m). Summing over i in both sides of (3.2.17) and taking into 
account (3.2.18), we get that 
dmE{S} = PmSm, 
or 
Sm 


U T 


(3.2.19) 


3. Reinsurance Market 591 


Recalling the definition of qm, we can write it in the following more explicit form: 


Q(Om) =P (Om) (3.2.20) 


Thus, we have found the equilibrium pricing measure Q. For such a measure, the 
; ; 1 
price of any r.v. Y is Gg(Y) = Eo{Y¥ } = $} Y (@n)O(@m) = E{s} VEY (@n)S(@n)P(@n) = 
m m 
E{YS}/E{S}. Thus, 


(3.2.21) 


Note that we have assumed Q to be discrete just to simplify proofs. Formula (3.2.21) 
looks meaningful in the general case, and it indeed may be obtained under rather weak 
conditions. 

Furthermore, it turns out that with respect to the equilibrium price (3.2.21), the opti- 
mal r.v. 


Xj = riS, (3.2.22) 
where s) 
EJ X;S 
A , 22 
r ESI (3.2.23) 


To prove this, we should show that such a solution satisfies (3.2.17). First of all, observe 
that, by (3.2.21), for the r.v. (3.2.22), 


d; = G(X;-X) = a EAKS}) = rey EXS -ES 
1 
= Bsy EMS - E{X;S}) = 


in view of (3.2.23). 

So, di = 0, and we can write (3.2.17) as the equality dmE{X;} = DmXim. Making use of 
(3.2.19), we write it as SmE {X} = XimE{S}. Recalling the definitions of Sm and Xim, we 
have 

S(Om)E{X;} = Xi(@m)E{S}. 


Substituting X; from (3.2.22), we come to an identity. 
Thus, in our model, equilibrium reinsurance corresponds to the proportional reinsurance 
(3.2.22). Since G(X; — X;) = 0, we have 


G(X) = G(X;), (3.2.24) 


that is, each participant retains a claim, perhaps different from the original claim, but of the 
same price as the claim X;. 

The last property is due to the special choice of utility functions and parameters s;. It is 
not very difficult to show that in the case of arbitrary quadratic utility functions, (3.2.22) 
continues to be true (with different r;) but (3.2.24) may not hold (see [16]). 


592 12. RISK EXCHANGE: REINSURANCE AND COINSURANCE 


Let us look at the solutions obtained more closely. Assume X;’s are independent, and set 
mi = E{X;}, 07 = Var{X;}, ms = mı +... + Mn, 05 = 07 +... +02. Then the equilibrium 
price of the original claim X; will be 


E{X;S 1 
Gx) = Ee = (0%) Zeta) (3.2.25) 
s jži 
1 1 
= ins (za +m Em) = = (EXP) + mils =m) 
2 
= 2 (o; +mims) =mi+ ot (3.2.26) 
ms ms 


which looks simple and nice. The larger 6;, the more one should pay to reinsure the risk. 
Using what we have already computed, for the retention coefficient r; we have 


a E{X;S} a mims + 07 o mi +67 /ms 


ri ; 3.2.27 
ES) mro neh O2/ms ra 


which is also nice and meaningful. 


4 EXERCISES 


Section 1 


1. Show that in the situation of Example 1.1.1-1, for any identically distributed X;’s, 
E{Ra(n)(X)} = nE {Racy (X1) +... + Rasy (Xn) } = nE {Rais (X1) 


and give an interpretation. Does it means that d(n) = nd(1)? For independent X; distributed 
as in Example 1.1.1-1, paralleling (1.1.9), show that 


E{R3(X)} =n(nt IP (dsn +2) +a? (1 —T(d;n)). 


Set n = 10 and compute Var{Ry(10)(X)} and Var{Rq(1)(X1) +... +Ra(1) (X10) }. the variance 
of the total payment in the case of excess-of-loss reinsurance. Interpret the result. 


2. In the situation of Example 1.1.1-1, write equation (1.1.10) for the case when X;’s have the 
T-distribution with parameters (1,v). 


3. Problems below concern the situation of Section 1.2.1. 


(a) A company has a portfolio with a net premium (i.e., mean payment) of $300,000, and 
a standard deviation of $100,000. For a new risk, the net premium is $100,000 and the 
standard deviation is Og. For what ©ọ is the reinsurance of the new risk reasonable? 
Find the retention coefficient for 69 = $90,000. 


(b) Let the original portfolio consist of i.i.d. risks with a mean of m, and a standard devia- 
tion of 6. 


4. 


10. 


4. Exercises 593 


i. Write the formula for the coefficient 2k?°m. Does it depend on the number of 
risks? 
ii. Let the number of the risks in the original portfolio be large. How will the formula 
(1.2.4) look in this case? 
iii. In the situation of Example 1.2.1-2, let q = 0.1, m = 4, © = 1, and let the number 
of risks in the original portfolio be n = 100. Find vVyetainea- For which q in this case 
does the approximation 2k*m have an error less than 5%? 


In the situation of Example 1.2.2-1, give an economic explanation why (i) if Ao gets smaller, 
the need for reinsurance increases; (ii) the same is true if Oo is increasing. 


. In the scheme of Section 1.3.1, find the maximal value of y,. 


. In the scheme of Section 1.3.1, assume that X has the I-distribution with parameters (a,v), 


6 = 0.05, Oreins = 0.08. (a) Using (1.3.1) as an approximation, find the estimate of the re- 
tention coefficient maximizing the adjustment coefficient. How does your estimate depend 
on a and v? Is it a property of the estimate or of the real maximizer? Find the adjustment 
coefficient for the optimal r. (b) Make a heuristic conjecture about the values of 8, a, and v 
for which the approximation you use will work relatively well. 


(a) In the scheme of Section 1.3.2, assume that X is exponentially distributed, E{X} = 1, 
6 = 0.05, Oreins = 0.06. Given d, write an equation for the adjustment coefficient yy. 
Using software, estimate d for which yg attains its maximum. 


(b) For the same X as in (a), write the formula for the optimal r in the case of proportional 
reinsurance applied to each claim. Find the particular answer for the 0’s above. 


Sections 2 and 3 


. Write a condition under which solution (2.1.3) is meaningful. 


. Consider the situation of Example 2.1-3. 


(a) Show that any point from (2.1.9) corresponds to some vector (Z;,Z2) for which (2.1.8) 
is true. (Advice: Consider vectors (k1S, k2S) for various choices of kı and k2.) 


(b) Does one point from (2.1.9) correspond to one solution (Z;,Z2) or perhaps many? 


(c) Find the set V and the set of Pareto optimal points V for u(x) = x. Show that in this 
case, the status-quo point is Pareto optimal, and consequently, the set Vo consists of 
only one point. 


(d) Let Yı, Y> be independent and uniformly distributed on (0, 1], and let u(x) = x!/⁄3. Show 
that in this case, V = { (V1, V2) : Vv? +V% =a, 0 < V; <a, 0 < V <a}, where a = 
E{(¥; +¥)!/3}. Find a and the set Vo. 

To solve this problem, we need the following generalization of the Cauchy-Schwarz in- 
equality, which is called Hélder’s inequality: E{En} < (E{EP})!/P 
(E{n1})!/4 for positive E,n and p > 1,q>1, i + 7 = |; (see, e.g., [38]). The Cauchy- 
Schwarz inequality corresponds to p = q = 2. For our problem we need Hölder’s 
inequality for p = 3, q = 3/2, that is, E{En} < (E{&3})'3(E{m3/7})?/9. 


w 


Find a solution to the problem from Example 2.2-1 for n = 3 and B; = B. Assume that the 
r.v.’s Y; are uniformly distributed on [0,2i] for i = 1,2,3, and the participants do not make an 
additional profit on the exchange on the average. 


594 


11. 


12. 


13. 


14. 


15:4" 


16.** 


1" 


12. RISK EXCHANGE: REINSURANCE AND COINSURANCE 


Show that for large v,,V2, the solution to (2.2.13) is approaching (2.2.14). 


(a) Sketch the graph of the function V (r1,r2) = a1Vi (71,72) +. a2V2(r1,72) from Section 
2.3.1. The graph is represented by a surface in the space (V,ri,r2). What is this 
surface called? Does this function have a minimum? Is it unique? In your answer, 
refer to known facts from Calculus. 

(b) Sketch the regions in the plane (r1,r2) for which Vı(rı,r2) < Vi(1,1) and 
Va(r1,r2) < Vo(1, 1). 


Show that the solution (2.3.10) belongs to the set of admissible solutions (2.3.5). 


Consider the solution (2.3.10) in the case of Examples 2.3.1-lab. Demonstrate that this is a 
Pareto optimal solution. 


Let Vi(r1,r2) and V2(r1,r2) be the functions defined Section 2.3.2. Analyze the function 
V(r1,r2) = a1 V1 (r1,r2)+a2V2(r1,r2) in the spirit of Exercise 12a. 


Consider the scheme of Section 2.3.2. 


(a) Show that if c; = (1 + 0)mj, the set of Pareto optimal solutions depends only on 10 
rather than on T and 6 separately. 


(b) Let X1, X2 have the I-distributions with parameters (1,v 1) and (2,V2), respectively and 
tO = 4. Find the set of Pareto optimal solutions. 


The problem concerns Example 3.2-1. 


(a) Find the prices Gg (X;) and r; in the case of i.i.d. X;’s. Interpret your result. In particular, 
describe the behavior of Gg(X;) for large n. 


(b) Find the formulas for Gg(X;) in the case when Corr{X;, X;} = p for i # j. Compare 
the results with the case p = 0. What would you get if m; = m and 6; = ©? Interpret 
the result. In particular, consider the case of large n and the case of n = 2 and p < 0. 


Appendix 


1 SUMMARY TABLES FOR BASIC DISTRIBUTIONS 
TABLE 1. Some basic distributions 


Distributions cdf mean | variance Some properties 


DISCRETE: probabilities 


Binomial (2) pg", A step np npq | The distribution of 
m=0,1,... n function “the number of successes” 

Geometric: pa", A step 1/p | q/p* | 1. The distribution of 

Ist version m= 1,2,... function “the Ist success” 


2. The lack of memory 
3. P(X > m) =q” 


Geometric: pq”, A step q/p | q/pP | P(X >m) =q"! 
2nd version m=0,1,2,... function 

Negative C ipa. A step v/p | vq/p? | The distribution of 
binomial: m=Vv,vV+1,... function “the vth success” 


lst version 


Negative C2 pe", A step vq/p| vq/p* 

binomial: m=0,1,2,... function 

2nd version 

Poisson e^" /ml, A step À À The sum of two indepen- 
m=0,1,2,... function dent Poisson r.v.’s_ with 


parameters A; and A» have 
the Poisson distribution 
with the parameter A +A. 


CONTINUOUS: pdf 
— b | (b—a)? 
Uniform 1/(b—a), = f au (2-4) All values from [a,b] are 
b—a 2 12 ; 
a<x<b a<x<b equally likely 
Exponential ae ™, x >0 l-e®, 1/a 1/a? | 1. The lack of memory 
x>0 2. P(X >z)=e% 


595 


596 


TABLE 1. (Continued) 


APPENDIX 


Distributions cdf mean | variance | Some properties 
[-distribution | a’x’~'e~“/T(v), — v/a | v/a’ 
x>0 
(x—m 2 uae 
Normal e 2 p € A m o? The sum of inde- 
v 206 a pendent (mı,07)- 
and (m2,05)- 
normal r.v.’s is 
(m,+mz, 6{+05)- 
normal 
TABLE 2. Some basic m.g.f.’s 
Distributions Moment generating function M (z) 
DISCRETE: probabilities 
Binomial (2) pg", (pë +q)” 
m=0,1,...,n 
Geometric: pq”! ep/(1—qe*), 
Ist version m= 1,2,... exists for z < In(1/q) 
Geometric: pq” p/(1—4qe), 
2nd version pe 0) 12500 exists for z < In(1/q) 
Negative binomial: Cs Deen. 2 [e*p/(1 =q)", 
Ist version m=Vv,v+l,. exists for z < In(1/q) 


Negative binomial: 


v+m— n pq v KA 
m 


[p/(1 —ge*)]”, 


2nd version m=0, on exists for z < In(1/q) 
Poisson e™™" /m!, exp{à(e*—1)} 
m=0,1,2,... 
CONTINUOUS: pdf 
Uniform 1/(b—a), (e? — e) /[z(b—a)| 
a<x<b 

Exponential ae ™, 1/(1-z/a), 
x>0 exists for z < a 

T-distribution ane T(V) [1/(1—z/a)]”, 
x>0 exists for z < a 


Normal 


exp {mz +0°z/2} 


2. Tables for the Normal Distribution 597 


2 TABLES FOR THE STANDARD NORMAL DISTRIBUTION 


TABLE 1. The standard normal distribution function P(x). 


0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 


0.508 0.516 0.5199 0.5239 0.5279 0.5319 
0.5478 0.5557 0.5596 0.5636 0.5675 0.5714 
0.5871 0.5948 0.5987 0.6026 0.6064 0.6103 
0.6255 0.6331 0.6368 0.6406 0.6443 0.648 
0.6628 0.67 0.6736 0.6772 0.6808 0.6844 
0.6985 0.7054 0.7088 0.7123 0.7157 0.719 
0.7324 0.7389 0.7422 0.7454 0.7486 0.7517 
0.7642 0.7704 0.7734 0.7764 0.7794 0.7823 
0.7939 0.7995 0.8023 0.8051 0.8078 0.8106 
0.8212 0.8264 0.8289 0.8315 0.834 0.8365 
0.8461 0.8508 0.8531 0.8554 0.8577 0.8599 
0.8686 0.8729 0.8749 0.877 0.879 0.881 
0.8888 0.8925 0.8944 0.8962 0.898 0.8997 
0.9066 0.9099 0.9115 0.9131 0.9147 0.9162 
0.9222 0.9251 0.9265 0.9279 0.9292 0.9306 
0.9357 0.9382 0.9394 0.9406 0.9418 0.9429 
0.9474 0.9495 0.9505 0.9515 0.9525 0.9535 
0.9573 0.9591 0.9599 0.9608 0.9616 0.9625 
0.9656 0.9671 0.9678 0.9686 0.9693 0.9699 
0.9726 0.9738 0.9744 0.975 0.9756 0.9761 
0.9783 0.9793 0.9798 0.9803 0.9808 0.9812 
0.983 0.9838 0.9842 0.9846 0.985 0.9854 
0.9868 0.9875 0.9878 0.9881 0.9884 0.9887 
0.9898 0.9904 0.9906 0.9909 0.9911 0.9913 
0.9922 0.9927 0.9929 0.9931 0.9932 0.9934 
0.9941 0.9945 0.9946 0.9948 0.9949 0.9951 
0.9956 0.9959 0.996 0.9961 0.9962 0.9963 
0.9967 0.9969 0.997 0.9971 0.9972 0.9973 
0.9976 0.9977 0.9978 0.9979 0.9979 0.998 
0.9982 0.9984 0.9984 0.9985 0.9985 0.9986 
0.9987 0.9988 0.9989 0.9989 0.9989 0.999 


0.9 0.91 092 093 094 0.95 0.96 0.97 0.98 0.99 


1.555 1.645 1.751 1.88079 2.05 2.33 infinity 


598 


APPENDIX 


3 ILLUSTRATIVE LIFE TABLE 


0.000171 
0.000141 
0.000141 
0.000151 
0.000141 
0.000151 
0.000151 
0.000182 
0.000222 
0.000263 
0.000343 
0.000576 
0.000677 
0.000850 
0.000942 
0.000932 
0.000984 
0.000944 
0.000945 
0.000946 
0.000927 
0.000958 
0.000918 
0.000939 
0.000991 
0.001023 
0.001065 
0.001066 
0.001180 
0.001254 
0.001368 
0.001453 
0.001568 
0.001715 
0.001915 


'This table was prepared mainly by Sara Zarei as a part of her Master Thesis [148]. 


TABLE 1. Illustrative Life Table; & = 4% ! 


0.006994 
0.000473 
0.000322 
0.000242 
0.000202 
0.000171 
0.000151 
0.000141 
0.000151 
0.000141 
0.000151 
0.000151 
0.000192 
0.000222 
0.000263 
0.000354 
0.000566 
0.000688 
0.000850 
0.000942 
0.000933 
0.000984 
0.000935 
0.000956 
0.000947 
0.000927 
0.000959 
0.000908 
0.000940 
0.001002 
0.001024 
0.001066 
0.001057 
0.001191 
0.001244 
0.001369 
0.001454 
0.001570 
0.001727 
0.001907 


62.91043 
58.91851 
60.87854 
63.06097 
65.40848 
67.88993 
70.50124 
73.24761 
76.10645 
79.07307 
82.17047 
85.38548 
88.73218 
92.18848 
95.74994 
99.42113 
103.1706 
106.8667 
110.6257 
114.3878 
118.2258 
122.2323 
126.3611 
130.6971 
135.2136 
139.9180 
144.8358 
149.9322 
155.2755 
160.8241 
166.5611 
172.5120 
178.6776 
185.1007 
191.7007 
198.5195 
205.5341 
212.7786 
220.2394 
227.9031 


23.89891 
24.00071 
23.95073 
23.89507 
23.83520 
23.77191 
23.70532 
23.63527 
23.56237 
23.48671 
23.40771 
23.32572 
23.24037 
23.15222 
23.06139 
22.96776 
22.87214 
22.77788 
22.68201 
22.58606 
22.48818 
22.38600 
22.28070 
22.17012 
22.05494 
21.93496 
21.80954 
21.67956 
21.54329 
21.40178 
21.25547 
21.10370 
20.94646 
20.78265 
20.61433 
20.44043 
20.26153 
20.07677 
19.88650 
19.69105 


13.946618 
8.1953125 
8.4085569 
8.789316 
9.2817204 
9.8551417 
10.506343 
11.241791 
12.038585 
12.891885 
13.826322 
14.828768 
15.914845 
17.061767 
18.264756 
19.52857 
20.818797 
21.9895 
23.159305 
24.259035 
25.361779 
26.566503 
27.822582 
29.223142 
30.740919 
32.385796 
34.188179 
36.11216 
38.236782 
40.52005 
42.946009 
45.546363 
48.326154 
51.339598 
54.499572 
57.857616 
61.392071 
65.147237 
69.113307 
73.279881 


0.002088 
0.002249 
0.00237 
0.002631 
0.002826 
0.003065 
0.003296 
0.003508 
0.003891 

0.00413 
0.004425 
0.00482 
0.005005 
0.005551 
0.005843 
0.006722 
0.006613 
0.007624 
0.008343 
0.00943 
0.009748 
0.01088 
0.011907 
0.012958 
0.014095 
0.015313 
0.016473 
0.018212 
0.019619 
0.021674 
0.02364 
0.02564 
0.027668 
0.030537 
0.033275 
0.036579 
0.039779 
0.043331 
0.047214 
0.052513 
0.057608 
0.062248 
0.071455 
0.073427 
0.084892 
0.093125 
0.101898 
0.111261 
0.121195 
0.131674 


3. Illustrative Life Table 


TABLE 1 Se 


o e o e ma) a Tw [= 


0.002076 
0.002237 
0.002367 
0.002635 
0.002830 
0.003060 
0.003312 
0.003514 
0.003899 
0.004138 
0.004435 
0.004832 
0.005018 
0.005566 
0.005861 

0.006734 
0.006646 
0.007642 
0.008379 
0.009475 
0.009796 
0.010940 
0.011979 
0.013031 

0.014207 
0.015420 
0.016611 

0.018392 
0.019814 
0.021913 
0.023911 

0.025975 
0.028059 
0.031013 
0.033841 

0.037265 
0.040593 
0.044298 
0.048366 
0.053962 
0.059316 
0.064293 
0.074140 
0.076266 
0.088718 
0.097756 
0.107479 
0.117961 

0.129203 
0.141201 


235.7404 
243.7926 
252.0708 
260.6175 
269.3311 
278.2828 
287.4555 
296.8694 
306.5525 
316.4033 
326.5348 
336.927 
347.5323 
358.5046 
369.6364 
381.1051 
392.5751 
404.6593 
416.7269 
428.9696 
441.2068 
453.8895 
466.6098 
479.4542 
492.4442 
505.5724 
518.8371 
532.3068 
545.7582 
559.3867 
572.9602 
586.5695 
600.2583 
614.0775 
627.7719 
641.4615 
655.0211 
668.5689 
682.0779 
695.5393 
708.6238 
721.4993 
734.4118 
746.2522 
759.0123 
770.5061 
781.6138 
792.3532 
802.7426 
812.8176 


19.49117 
19.28581 
19.07469 
18.85672 
18.63449 
18.40619 
18.17226 
17.93218 
17.68522 
17.43399 
17.17561 
16.91057 
16.6401 
16.36027 
16.07637 
15.78388 
15.49136 
15.18317 
14.87541 
14.56318 
14.25109 
13.92764 
13.60323 
13.27565 
12.94436 
12.60955 
12.27126 
11.92773 
11.58468 
11.23711 
10.89094 
10.54386 
10.19474 
9.842311 
9.493058 
9.143927 
8.798111 
8.452597 
8.108073 
7.764762 
7.431065 
7.102696 
6.773383 
6.471414 
6.145988 
5.85286 
5.569577 
5.295686 
5.030721 
4.773775 


77.616789 
82.177482 
86.981719 
92.088931 
97.383939 
102.95937 
108.80271 
114.94751 
121.43956 
128.16161 
135.26474 
142.73742 
150.5311 
158.85828 
167.4679 
176.60436 
185.84031 
196.0015 
206.27453 
216.92105 
227.70509 
239.25446 
251.03263 
263.16664 
275.69957 
288.63526 
301.98609 
315.86774 
329.97319 
344.59746 
359.41414 
374.56309 
390.12187 
406.18399 
422.3752 
438.88263 
455.51941 
472.4736 
489.71345 
507.23497 
524.51174 
541.80071 
559.50566 
575.79362 
593.93351 
610.31981 
626.35592 
642.04865 
657.40544 
672.46303 


599 


600 APPENDIX 
TABLE 1 (continued). 


Pre rien 


7 


0.154059 | 822.6350 687.29561 
0.167710 | 832.2601| 4. 701.99372 


0.166541 0.182198 | 841.8022| 4. 716.72837 
0.179267 0.197594 | 851.4107) 3. 731.74768 


0.192478 0.213833 | 861.2915| 3. 747.41210 
0.206227 0.230865 | 871.7577) 3. 764.29360 
0.220270 0.249053 | 883.2599| 2. 783.24968 
0.234809 0.267753 | 896.5109) 2. 805.68180 
0.249803 0.287594 | 912.5685} 2. 833.74700 
0.264984 0.308116 | 933.0991} 1. 870.95040 

1 1.000000 | 960.7894 923.11635 


4. Some Facts from Calculus 601 


4 SOME FACTS FROM CALCULUS 
4.1 The “little o and big O” notation 
4.1.1 Little o 


What we discuss below is not a new notion but a notation which turns out to be useful in 
many calculations. We start with a particular definition. 

Denote by the symbol o(x) (little o of x) any function o(x) = €(x)x, where the function 
€(x) + 0 as x — 0. In other words, o(x) — 0 faster than x. 

Another way to define o(x) is to say that 

ow) >0 as x0, 
x 

which is the same. 

For example, x? = o(x), and x?/? = o(x), while \/x is not o(x). 

Heuristically, the formula x? = o(x) means that “x? is much smaller than x” for small x’s. 

Certainly, we can replace x by, say, x? and in this case, the expression o(x?) denotes any 
function such that [o(x”) /x°] — 0. For example, x3 = 0(x’). 

If a function 

f(x) =142x*+0(x*) as x30, (4.1.1) 


we can say that f(x) converges to one, as x — 0, at a rate of 2x” up to a remainder o(x?) 
which is negligible in comparison with the term 2x? for small x’s. Formally, (4.1.1) means 
that f(x) — 1 — 2x? = o(x?), that is, 


f(x) -—1—2:x 


7 +O as x0. 


x 
EXAMPLE 1. At what rate does the function f(x) = (1 + 4x?) converge to one as 
x — 0? By the formula (1 +a)? =1+3a+3a* +a, 
(1 +427)? = 143-42? + A(x), 


where A(x) is the sum of the terms containing higher powers of 4x”, that is, (4x7)? and 
(4x”)°. The sum of all such terms converges to zero faster than x*. Hence, the remainder 
A(x) is o(x). Thus, 

(Paar =1+12x2 po): 


If we want to approximate f(x) for small x’s with greater accuracy, we can write 


(1+4x7)3 = 143-417 4+3- (4x7)? + o(x4) = 14 12x? + 48x4 + 0(x*). 


The reader may find more sophisticated examples connected with Taylor’s expansion in 
Section 4.2. 


It is worth emphasizing that o(x) denotes not a particular function but a function (or some 
function) with the above property. Therefore, although we can write the expression 20(x), 


602 APPENDIX 


it would not make much sense since multiplying o(x) by 2 we get a function with the same 
property: it converges to zero faster than x. Thus, we can write 20(x) = o(x). For the same 
reason, we may write that o(x) + 0(x) = o(x), ete. 


EXAMPLE 2. Let us approximate the function f(x) = (1 +x)4 + (1 +3x’)? for small x’s 
by a quadratic function. Similar to what we did in Example 2, and recalling that (1 +a)* = 
1+4a+6a* +4a?+a* =1+4a+6a’ + 0(a’), we have 


f(x) = 1+4x+6x +o) + 143-3x* +.0(x7) = 24404 15x” + 0(27). 


So, f(x) =2+4x+15x?+ the term which is negligible in comparison with the main term 
as x > 0. 


The same relations may be established certainly not only for x’s close to zero but to any 
point, or for x — œ. The general definition looks as follows. 
Let us consider functions f(x) and g(x). We say that 


f(x) =o(g(x)) as x—> xo, (4.1.2) 
(f is a little “o” of g) if 
fe) 9 as x— xo. (4.1.3) 
8(x) 


Here xo is any number, or œ, or —o9, 

For example, x7? = o(x7!) as x 4 œ, or (x— 1) =0((x—1)7) asx 1. 

For another good example, let us set g(x) = 1 in (4.1.2). Then f(x) = 0(1). By definition 
(4.1.3), this relation means that f(x) — 0 as x — xo. Therefore, the expression o(1) denotes 
any vanishing function. For instance, instead of saying that In(1 +x) — 0 as x — 0, we can 
say that In(1 +x) = o(1). Actually, it is not shorter but sometimes convenient. 


4.1.2 Big O 


For a d > 0, we call an interval A = (xo — d, x9 +d) a neighborhood of a point xo. We 
say that 


f(x) =O(g(x)) as x xo, 
(“f is a big O of g”), if there exists a constant C and a neighborhood A of xo such that for 
all x € A 
If(x)| < Cle(x)|.- (4.1.4) 


For instance, if we want to say that f(x) = 1 +x+x? has the same order as 1 +x for x 
close to zero, we can write that 


1+x+x =0(1+x) (4.1.5) 


. $ 2 
as x — 0. Certainly, (4.1.5) is not true for all x’s because L — œ as x — œ but, for 


example, 1 +x +x? < 2(1 +x) for |x| < 1/2, which the reader can readily verify. 

If x9 = œ (or x9 = —œ), we say that f(x) = O(g(x)) as x — œ (or x + —%), if (4.1.4) 
is true for x’s larger than some co (or smaller than some negative co). 

For example, x + 3x” = O(x") as x — œ. 


4. Some Facts from Calculus 603 


4.2 Taylor expansions 
4.2.1 A general expansion 


The Taylor expansion concerns approximations of sufficiently smooth functions f(x) by 
polynomials in a neighborhood of a particular chosen point xo. Without loss of generality, 
we can set xo = 0. If it is not so, it suffices to translate the origin to xo, or more precisely, 
to consider instead of f(x) the function fo(x) = f(x + xo), and apply the Taylor expansion 
to the latter function. 

Let f(x) be n+ 1 times differentiable in a neighborhood of zero, that is, for x € A = 
(—d,d) for some d > 0. Then the Taylor formula states the following: for all x € A, 


F(x) = Pax) + Ral), (4.2.1) 


where the Taylor polynomial of the nth order 


f is the kth derivative of f, and the remainder 


(n+1) 
Ryle) = LO yr 


=a] (4.2.2) 


for some c between 0 and x. 

The polynomial P,,(x) is the best approximating polynomial in the sense that the first n 
derivatives of P, (x) and f(x) at the origin coincide; i.e., P® (0) = f (0), k=0,1,...,n. 

The remainder R,,(x) serves as an estimate of the accuracy of the polynomial approxima- 
tion. 

If we do not need to estimate the accuracy, we do not have to assume existence of the 
derivative f("*)), and may restrict ourselves to the formula 


n glk) 
fæ) = Ł Í O t Log), (4.2.3) 


This is true if the nth derivative f™ (x) exists and is continuous in a neighborhood of zero. 
The notation o(-) is explained in Section 4.1. 

If f has all derivatives, then under some additional conditions, we can set n = œ, and 
write 


œ p(k) 
fey EUa, (4.2.4) 


For example, this is true if |f™ (x)| < M” for some M and all n in the neighborhood 
where we approximate the function. 


604 APPENDIX 


4.2.2 Some particular expansions 


All expansions below are verified by making use of the general formulas (4.2.3) and 
(4.2.4). 
The exponential function: for all x’s, 


Fe ey ie el ae = (4.2.5) 


As x — 0, it often suffices to consider the first three terms, that is, the approximation 


2 
e=14+x+ > +o): (4.2.6) 


The logarithmic function. For In(1 + x), the expansion 


= 1) p 
n(1 +x) SLC (4.2.7) 


is true for —1 <x < 1. In particular, 


2 
In(1 +x) =x- 5 +06) as x — 0. (4.2.8) 


From the last relation it follows that 
ln(1 +x) ~x as x 0, 
where, as usual, the symbol a(x) ~ b(x) means that aa >l. 


The power function. Next, we consider the function (1 — x)~“ for œ > 0. The Taylor 
expansion here is true for |x| < 1, and is given by the formula 


(I-ayt= ey =F Ges ‘x (4.2.9) 


where for any real r 


(") e aaes 


[We defined it also in (3.1.3).] The second equality in (4.2.9) is based on the formula 


e) ni ayn pi 


which is true since 


4. Some Facts from Calculus 605 


U(X, ) 
u(Ax,+(1-A)x,) 


Aux) + 1-A) u(x, ) 


u(x) 


X Ax, +( 1-A)X> Xp 


FIGURE 17. The first definition of a concave function. 


For o = 1, the coefficient Gee) = (a) = 1, and for |x| < 1 we have 


1 
1-x 


=14xtxrt..= yx. (4.2.10) 
k=0 
We see that this is just the formula for the geometric series, which can certainly be derived 
without Taylor’s expansion from the well known formula 
1 ntl 
ie aa (4.2.11) 


l—x 


The last formula is true for all x Æ 1. 


4.3 Concavity 


In introductory Calculus courses, a concave function u is often defined as a function for 
which u” (x) < 0. For us, this definition is somewhat restrictive. The definition below does 
not contradict the above definition if u is twice differentiable. 


Definition. We say that a function u(x) defined on an interval J is concave if for any 
X1,X2 € I and any A € (0, 1], 


Au(x1) + (1 —A)u(x2) < u(Axy + (1 —A)x2). (4.3.1) 


See, as an illustration, Fig.1. 


It is known that u(x) so defined is continuous at any interior point of 7; see, e.g., [142]. 
The following proposition may be viewed as another definition of concavity. 


Proposition 1 A function u(x) is concave on an interval I if and only if for any interior 
point xo € I there exists a number c, perhaps depending on xo, such that for any x € I 


u(x) — u(xo) < c(x—X0). (4.3.2) 


Note that Proposition 1 does not presuppose that the number c is unique. If u(x) is 
differentiable at xo, then (4.3.2) is true for c = u(x); see Fig.2i. However, if u is not 


606 APPENDIX 


smooth at xg, then there may be many c’s for which (4.3.2) is true; see Fig. 2ii. A proof 
may be also found in [142]. The line c(x — xo) is called a support of u(x) at xo. 


f u(x) 


pay 


(i) (i) 
FIGURE 18. The second definition of a concave function. 
We call a function u(x) convex if (4.3.1) (or (4.3.2)) is true with the replacement of the 


inequality sign < by the sign >. Note that there is no need to explore convex function 
separately because, if a function u(x) is convex, then the function —u(x) is concave. 


References 


The list below should not be considered a bibliography. It is a list of books and papers to 
which we refer in this book. 


[2 


Ga 


[3 


Sr 


[4 


ey 


[5 
[6 


i 


[7 


— 


[8 


a 


[9 


— 


[10] 


[11] 


[12] 


[13] 


Albanese, C., Credit Exposure, Diversification Risk and Coherent VaR, Working 
Paper, Department of Mathematics, University of Toronto, September, 1997. 


Allais, M., Le comportement de l’homme rationnel devant le risque: Crtique des 
postulats et axiomes de l’ Ecole Americaine, Econometrica, 21, 1953. 


Arias, E., United States Life Tables, 2002, Monthly vital statistics report, vol. 53, 
no. 6, Hyattsville, Maryland: National Center for Health Statistics, November, 2004. 


Anderson, A.W., Pension Mathematics for Actuaries, 2nd edition, ACTEX Publica- 
tions, Winsted, Connecticut, 1992. 13, 1, 35, 1993. 


Arrow, K.J., Essays in the Theory of Risk Bearing, Markham, Chicago, 1971. 


Arrow, K. J., and Hahn, F.H., General Competitive Analysis, San Francisco: Holden- 
Day, 1971. 


Arrow, K.J., Optimal insurance and generalized deductibles, Scandinavian Actuarial 
Journal, 1, 1974. 


Artzner, P., Delbaen, F., Eber, J.-M., and Heath, D., Coherent measures of Risk, 
Math. Finance, 9, 3, 1999. 


Artzner, P., Applications of coherent measures to capital requirements in insurance, 
North American Actuarial Journal, 3, 2, 1999. 


Asmussen, S., Ruin Probability, World Scientific, Singapore, River Edge, N.J., 
2000. 


Beard, R.E., Pentikäinen, T., and Pesonen, E., Risk Theory, Chapman & Hall, Lon- 
don, NewYork, 1984. 


Bening, V.E. and Korolev, V.Yu., Generalized Poisson Models and Their Applica- 
tions in Insurance and Finance, VSP, Utrecht, 2002. 


Bernoulli, D., Specimen Theoriae novae de mensura sortis, Commentarii Academiae 
Scientiarum Imperialis Petropolitanae, Tomus 5 (Papers of Imperial Academy of 
Sciences in Petersburg, v.5), p.175-192, 1738; translated into English in Economet- 
rica, 22, N1, 22-36, 1954. 


607 


608 


[14] 


[15] 


[16] 


[17] 


[18] 


[19] 


[20] 


[21] 
[22] 


[23] 


[24] 


[25] 


[26] 


[27] 


[28] 


[29] 


[30] 


[31] 


References 


Bhattacharya, R.N. and Rao, R.R., Normal Approximation and Asymptotic Expan- 
sion, SIAM, Fla, 2010. 


Borch, K., The rationale of the mean-standard deviation analysis: comment, Ameri- 
can Economic Review, 64, 428, 1974. 


Borch, K., The Mathematical Theory of Insurance, D.C. Heath and Company, Lex- 
ington, Massachusetts, 1974. 


Borch, K., Economics of Insurance, North-Holland, Elsevier Science Publishers 
B.V., Amsterdam, New York, North-Holland, 1990. 


Borg, S., On Optimal Dividend Payoffs: Karl Borch’s Model, Master Thesis, San 
Diego State University, 2003. 


Bowers, N.L., Herber, H.U., Hickman, J.C., Jones, D.A., and Nesbot, C.J., Actuarial 
Mathematics, Society of Actuaries, Schaumburg, Illinois 1997. 


Boyle, P.P., Karl Borch’s Research Contributions to Insurance, The Journal of Risk 
and Insurance, 62, 2, 307, 1990. 


Biihlmann, H., An economic premium principle, Astin Bullitin, 11, 52, 1980. 


Biihlmann, H., The general economic premium principle. Astin Bullitin, 14, 13-21, 
1984. 


Biihlmann, H., Gagliardi, B., Gerber, H., and Straub, E., Some inequalities for stop- 
loss premium, Astin Bullitin, 9,75, 1979. 


Chew, S.H., Axiomatic utility theories with the betweeness property, Annals of Op- 
eration Research, 19, 273, 1989. 


Chew, S.H. and MacCrimmon, K.R., Alpha-nu choice theory: A generalization of 
expected utility theory, Working paper no. 669, University of British Columbia, Van- 
couver, 1979. 


Chew, S.H. and MacCrimmon, K.R., Alpha utility theory, lottery composition, and 
the Allais paradox, Working paper no. 686, Faculty of Commerce, University of 
British Columbia, Vancouver, 1979. 


Chow, Y.S. and Teicher, Henry, Probability Theory, Independence, Interchangeabil- 
ity, Martingales, Springer-Verlag, New York, 1978. 


Cowley, A. and Cummins, J.D., Securitization of Life Insurance and Liabilities. The 
Journal of Risk and Insurance, 72, 193, 2005. 


Cvitanic, J. and Zapatero, F., Introduction to the Economics and Mathematics of 
Financial Markets, MIT Press, Cambridge, Massachusets, 2004. 


Daniel, J.W., Multi-state transition models with actuarial applications, Study Notes, 
SOA and CAS; http:-//casact.orgHibrary/studynotes/daniel.pdf, 2004. 


Daykin, C.D., Pentikäinen, T., and Pesonen, M. Practical risk theory for actuar- 
ies, Monographs on Statistics and Applied Probability, 53. Chapman and Hall, Ltd., 
London, 1994. 


[32] 


[33] 


[34] 


[35] 
[36] 
[37] 


[38] 
[39 


— 


[40] 
[41] 


[42] 


[43] 


[44] 


[45] 


[46] 


[47] 


[48] 


[49] 


[50] 


[51] 


References 609 


Dekel, E., An axiomatic characterization of preferences under uncertainty: Weaken- 
ing the independence axiom, Journal of Economic Theory, 40, 304, 1986. 


Denuit, M., Dhaene, J., Goovaerts M., and Kaas, R., Actuarial Theory for Dependent 
Risks, J.Wiley, New York, 2005. 


DeVylder, F., Martingales and ruin in a dynamic risk process, Scandinavian Actuar- 
ial Journal, 217, 1977. 


Doob, J.L., Stochastic Processes, Wiley, New York, 1990; the first edition — 1953. 
Durrett, R., Essentials of Stochastic Processes, Springer, New York, 1999. 


Embrechts, P., Kliippelberg, G., and Mikosch, T., Modeling Extremal Events for 
Insurance and Finance, Springer, Berlin, New York, 1997. 


Feller, W., Introduction to Probability Theory, 2, Wiley, New York, 1968. 


Fishburn, P.C., Non-linear Preference and Utility Theory, John Hopkins University 
Press, Baltimore, 1988. 


Folder, L., Expected utility and continuity, Review of Economic Studies, 39, 4, 1972. 


Gerber H.U., Martingales in Risk Theory, Mitt. Ver. Schweiz. Vers. Math. 73, 205, 
1973. 


Gerber H.U., An Introduction to Mathematical Risk Theory, S.S. Heubner Founda- 
tion Monographs, University of Pennsylvania, 1979. 


Gerber, H.U. (with exercises contributed by Samuel H. Cox), Life Insurance Math- 
ematics, 3" ed., Springer, Berlin, New York, 1997. 


Gerber, H.U. and Shiu, E.S., On the time value of ruin, North American Actuarial 
Journal, 2, 1, 48, 1998. 


Gikhman, I.I. and Skorohod, A.V., Introduction to the Theory of Random Processes, 
W. B. Saunders Co., Philadelphia, 1969. 


Gihman, I.I. and Skorohod, A.V., The Theory of Stochastic Processes, Springer- 
Verlag, New York, 1974-1979. 


Gikhman, I.I. and Skorohod, A.V., Controlled Stochastic Processes, Springer, 
Berlin, 1979. 


Gollier, C. and Schlesinger, H., Arrow’s theorem on optimality of deductibles: a 
stochastic dominance approach, Economic Theory, 7, 359-363, 1996. 


Goovaerts, M.G., De Vylder, F., and Haezendonck J., Insurance Premiums, North- 
Holland, Amsterdam, 1984. 


Grandel, J., Aspects of Risk Theory, Springer-Verlag, Berlin, New York, Heildeberg, 
1991. 


Gradshteyn, I.S. and Ryzhik, I.M., Table of Integrals, Series, and Products, Aca- 
demic Press, San Diego, 2000. 


610 


[52] 


[53] 


[54] 


[55] 


[56] 


[57] 


[58] 


[59] 


[60] 
[61] 


[62] 


[63] 


[64 


ey 


[65] 


[66] 


[67] 


[68] 


[69] 


[70] 


References 


Green, J. and Jullien, B. Ordinal independence in non-linear utility theory, Quarterly 
Journal of Economics, 102, 4, 785, 1988. 


Gupta, A.K. and Varga, T., An Introduction to Actuarial Mathematics, Kluwer Aca- 
demic, Dordrecht, Boston, 2002. 


Haezendonck, J. and Goovaerts, M.J., A new premium calculations principle based 
on Orlicz norm, Insurance: Mathematics and Economics, 1, 41, 1982. 


Hardy, G.H, Littlewood, J.E., and Polya, G. Inequalities, 2d ed., Cambridge Univer- 
sity Press, Cambridge, 1959. 


Hastie, R. and Dawes, R.M., Rational Choice in an Uncertain World, Sage, Thou- 
sand Oaks, Calif., 2001. 


van Heerwaarden, A.E., Kaaas, R., and Goovaerts, M.J., Properties of Essher pre- 
mium calculation principle, Insurance: Mathematics and Economics, 8, 261, 1989. 


van Heerwaarden, A.E., Kaaas, R., and Goovaerts, M.J., Optimal resinsurance in 
relation of ordering of risks, Insurance: Mathematics and Economics, 8, 11, 1989. 


Hoel, P.G., Port, S.C., and Stone, C.J, Introduction to Probability Theory, Hougton 
Mifflin Company, Boston, 1971. 


Holton, G.A., Value-at-Risk, Academic Press, Amsterdam, Boston, 2003. 


Horn, R.A. and Johnson, C.A., Matrix Analysis, Cambridge University Press, Cam- 
bridge, 1985. 


Hull, J.C., Options, Futures, and Other Derivatives, Prentice Hall, New Jersey, 
2002. 


Ibragimov, I.A., and Linnik, Yu.V., Independent and Stationary Sequences of Ran- 
dom Variables, Wolters-Noordhoff, Groningen, 1971. 


Introduction to RiskMetrics'™ . Morgan Guaranty Trust Company, 1995. 


Jensen, J.L.W.V., Sur les fonctions convexes et les ingalits entre les valeurs 
moyennnes, Acta Math 30, 175-193, 1906. 


Jorion, P., Value at Risk: the New Benchmark for Controlling Market Risk, McGraw- 
Hill, New York, 1997. 


Kaas, R., Goovaerts M., and Dhaene, J., Modern Actuarial Theory, Kluwer, Dor- 
drecht, The Netherlands, London, 2001. 


Kahneman, D. and Tversky, A., Prospect Theory: Analysis of decision under risk, 
Econometrica, 47, 263, 1979. 


Kalashnikov, V.V., Geometric Sums: Bounds for Rare Events with Applications in 
Risk Analysis, Reliability, Queueing, Kluwer, Dordrecht, London, 1997. 


Kallenberg, O., Foundations of Modern Probability, Springer, New York, 2000. 


[71] 


[72] 


[73] 


[74] 


[75] 


[76] 


[77] 


[78] 


[79] 


[80] 


[81] 


[82] 


[83] 
[84] 


[85] 


[86] 


[87] 


References 611 


Karni, E., Optimal insurance: A non-expected utility analysis, in: G. Dione, ed., 
Contributions to Insurance Economics, Kluwer Academic Publishes, Dordrecht, 
London, 1992. 


Keeney, R.L. and Raiffa, H., Decisions with Multiple Objectives: Preferences and 
Value Trade-offs, Wiley, New York, 1976. 


Kiruta, A.Ya., Rubinov, A.M., and Yanovskaya, E.B., Optimal Choice of Distri- 
butions in Complex Social-Economic Systems (Probability Approach) (in Russian), 
Nauka, Moscow, 1980. 


Klugman, S.A., Panjer, H.H., and Willmot, G.E., Loss Models: from Data to Deci- 
sions, 2” ed., Wiley-InterScience, New Jersey, 2004. 


Kolmogorov, A.N., Foundations of the Theory of Probability, Chelsea Pub. Co., New 
York, 1956. 


Kolmogorov, A.N. and Fomin, S.V., Elements of the Theory of Functions and Func- 
tional Analysis, Graylock Press, Rochester, New York, 1957-65. 


Korolev, V. Yu. and Shorgin, S.Ya., On the absolute constant in the remainder term 
estimate in the central limit theorem for Poisson random sums, in: Probabilistic 
methods in Discrete Mathematics, VSP, Utrecht, 1997, 305 


Kremer, E., On robust premium principles, Insurance: Mathematics and Economics, 
5, 271, 1986. 


Lay, D., Linear Algebra and its Applications, Addison-Wesley, Reading, Mas- 
sachusets, 2003. 


LeCam, L., An approximation theorem for the Poisson binomial distribution, Pacific 
J. Math., 10, 1181, 1960. 


LeRoy, S. and Werner, J., Principles of Financial Economics, Cambridge University 
Press, 2001. 


Luce, R.D. and Raifa, H., Games and Decisions: Introduction and Critical Survey, 
John Wiley, New York, Wiley 1957; the last edition — Dover, 1989. 


Luce, R.D., Utility of Gains and Losses, Erlbaum, Mahwah, New Jersey, 2000. 


Machina, M.J., “Expected utility” analysis without the independence axiom, Econo- 
metrica, 50, 277, 1982. 


Machina, M.J., Choice under uncertainty: problems solved and unsolved, Economic 
Perspectives, 1, 121, 1987. 


Majumdar, M., Equilibrium, Welfare, and Uncertainty: Beyond Arrow-Debreu, 
Routledge, 2009 


Majumdar, M. and Rotar, V., Equilibrium prices in a random exchange economy 
with dependent summands, Economic Theory, 15, 531-550, 2000. 


612 


[88] 


[89] 


[90] 


[91] 


[92] 


[93] 


[94] 


[95] 
[96] 


[97] 
[98] 


[99] 
[100] 


[101 


— 


[102 
[103] 


aaa 


[104] 


[105 


sy 


[106] 
[107] 


References 


Majumdar, M. and Rotar, V., Some general results on equilibrium prices in large 
random exchange economies, Annals of Operation Research, 114, 245-261, 2002. 


Marshall, A.W. and Olkin, I., A multivariate exponential distribution, Journal of the 
American Statistical Association, 62, 30, 1967. 


Marshall, A.W. and Olkin, I., Families of multivariate distributions, Journal of the 
American Statistical Association, 83, 834, 1988. 


Meyers, G., Coherent measures of risk: An exposition for the lay actuary, http:// 
casact.org/pubs/forum/00sforum/meyers/Coherent%20Measures %200f%20Risk. pdf 


Michel, R., On Berry-Esseen results for the compound Poisson distribution, Insur- 
ance: Mathematics and Economics, 13, 1, 35, 1993. 


Mikosch, T., Non-Life Insurance Mathematics. An Introduction with Stochastic Pro- 
cesses, Springer, Berlin, Heildeberg, New York, 2004. 


Milevsky, M.A., The Calculus of Retirement Income, Cambridge University Press, 
New York, 2006. 


Minc, H., Nonnegative Matrices, Wiley, New York, 1998. 


von Neumann, J. and Morgenstern, O., The Theory of Games and Economic Behav- 
ior, Princeton University Press, New York, 1947. 


Owen, G., Game Theory, 34 ed., Academic Press, San Diego, 1995. 


Panjer, H. and Willmot, G., Insurance Risk Models, Society of Actuaries, Schaum- 
burg, Illinois, 1992. 


Panjer, H. (ed.) et al., Financial Economics, The Actuarial Foundation, 1998. 


Petrov, V.V., Limit Theorems of Probability Theory. Sequences of Independent Ran- 
dom Variables, Clarendon Press, Oxford, 1995. 


Pesonen, Matti, Optimal reinsurance, Scandinavian Actuarial Journal, 2, 65, 1984. 
Pitman, J., Probability, Springer-Verlag, New York, 1993. 


Prelec, D., The Probability Weighting Function, Econometrica, 66, No. 3, 497- 
527,1998. 


Presman, E., On the approximation in variation of the distributions of a sum of inde- 
pendent Bernoulli random variables by the Poisson law, Theory Probab.Applic., 18, 
2, 39, 1983. 


Promislow, S.D., Fundamentals of Actuarial Mathematics, 2nd edition, Willey, 
2010. 


Quiggin, J., Generalized Expected Utility Theory, Kluwer, Boston, 1993. 


Quiggin, J., A theory of anticipated utility, Journal of Economic Behavior and Or- 
ganization, 3, 324, 1982. 


[108 


aa 


[109] 


[110] 


[111] 
[112] 


[113] 


[114] 


[115] 
[116] 


[117] 


[118] 
[119] 


[120] 


[121] 
[122] 
[123] 


[124] 
[125 


sy 


[126] 


[127] 


References 613 


Quiggin, J., Stochastic dominance in regret theory, Review of Economic Studies, 57, 
2, 503, 1989. 


Raviv, A., The design of an optimal insurance policy, American Economic Review, 
69, 84-96, 1979. 


Reich, A., Properties of premium calculation principles, Insurance: Mathematics 
and Economics, 5, 97, 1986. 


Renyi, A., Foundations of Probability, Holden-Day, San Francisco, 1970. 


Revuz, D. and Yor, M., Continuous Martingales and Brownian Motion, Springer- 
Verlag, Berlin, New York, 1991. 


Rinott, Y. and Rotar, V., On Edgeworth expansions for dependency-neighborhoods- 
chain structures and Stein’s Method, Probability Theory and Related Fields, 126, 
528-570, 2003. 


Roell, A., Risk aversion in Quiggin and Yaari’s rank-order model of choice under 
uncertainty, Economic Journal, 97, 143, 1987. 


Ross, S.M., Stochastic Processes, Wiley, New York, 1996. 


Ross, S.M., A First Course in Probability, 6' h ed., Prentice Hall, Upper Saddle River, 
New Jersey, 2002. 


Ross, S.M., Introduction to Probability Models, 6'" ed., Academic Press, San Diego, 
London, Boston, 1997. 


Rotar, G.V., A problem of control of reserve, Theory Probab. Appl., 17, 3,597, 1972. 


Rotar, G.V., On a problem of control of reserve, Economics and Mathematical Meth- 
ods, 12, 733, 1976. 


Rotar, V.I., Probability Theory, World Scientific, Singapore, River Edge, New Jer- 
sey, 1997. 


Rotar, V.I., Actuarial Models: The Mathematics of Insurance, CRC Press, 2006. 
Rotar, V.I., Probability and Stochastic Modeling, CRC Press, 2013. 


Rotar, V.I. and Shorgin, S.Ya., On reinsurance of risks and a retention value, Eco- 
nomics and Mathematical Methods, 32, 4, 124, 1996. 


Royden, H.L., Real Analysis, 3" d ed., Prentice Hall, New Jersey, 1988. 


Schlesinger, H., Insurance demand without the expected-utility paradigm, The Jour- 
nal of Risk and Insurance, 64, 1, 19-39, 1997. 


Schmidt, K.E., Positive homogeneity and multiplicativity of premium principles on 
positive risk, Insurance: Mathematics and Economics, 8, 1315, 1989. 


Senatov, V.V., Normal Approximation: New Results, Methods, and Problems, VSP, 
Utrecht, Boston, 1998. 


614 


[128] 


[129] 
[130] 


[131] 


[132] 


[133] 


[134] 


[135] 


[136] 


[137] 


[138 


a 


[139] 


[140] 


[141] 


[142 
[143 
[144] 


Lai, iea 


[145 


(Saer 


[146] 
[147] 


References 


Shiganov, I.S., Refinement of the upper bound of the constant in the central limit 
theorem, J. Soviet Mathematics, 35, 3, 2545, 1986. 


Shiryaev, A.N., Probability, Springer, New York , New York, 1996. 


Shiryaev, A.N., Essentials of Stochastic Finance: Facts, Models, Theory, World 
Scientific, River Edge, New Jersey, 1999. 


Sholomitskii, A.G., Choice Under Uncertainty and Modeling of Risk (in Russian), 
GU-VSE (The High Economic School), Moscow, 2005. 


Shorgin, S., Approximation of a generalized binomial distribution, Theory Probab. 
Appl., 22, 4, 867, 1977. 


Smolyak, S.A., On the dispersion of efficiency in the case of uncertainty (in Rus- 
sian), in: Methods and Models of Stochastic Optimization, p.181-212, Moscow, 
CEMI, 1983. 


Smolyak, S.A., Estimation of Efficiency of Investment Projects in the Case of Risk 
and Uncertainty (in Russian), Nauka, Moscow, 2002. 


Stampfli, J. and Goodman, V., The Mathematics of Finance, Modeling, and Hedging, 
Brooks/Cole, Pacific Grove, CA 2001. 


Stewart, J., Single Variable Calculus, 5th ed., Thompson-Brooks/Cole, Belmont, 
California, 2003. 


Stewart, J., Multivariable Calculus, 5“ ed., Thompson-Brooks/Cole, Belmont, Cal- 
ifornia, 2003. 


Tapiero, C., Zuckerman, D., and Kahane, Y., Optimal Investment-Dividend Policy 
of an Insurance Firm under Regulation, Scandinavian Actuarial Journal, 65, 1983. 


Taylor, H. and Karlin, S., An Introduction to Stochastic Modeling, 3" d ed., Academic 
Press, San Diego, London, Boson, 1998. 


Wakker, P.P., Additive Representations of Preferences, Kluwer, Dordrecht, Boston, 
1989. 


Waldmann, K.-H., On optimal dividend payments and related problems, Insurance: 
Mathematics and Economics, 7, 237, 1988. 


Webster, R., Convexity, Oxford University Press, Cambridge [Eng.], 1994. 
Webster’s New Universal Unabridged Dictionary, Barnes and Noble Books, 2003. 


Winterfeldt, von D. and Edwards, W. , Decision Analysis and Behavioral Research, 
Cambridge University Press, Cambridge [Cambridgeshire], New York 1986. 


Williams, R.J.,Jntroduction to Mathematics of Finance, American Mathematical So- 
ciety, 2006. 


Yaari, M.E., The dual theory of choice under risk, Econometrica, 55, 95-115, 1987. 
Yates, J.F., Judgment and Decision Making, Prentice-Hall, New Jersey, 1990. 


[148] 


[149] 


[150 


= 


[151] 


[152] 


[153] 


[154] 


[155] 


[156] 


[157] 


[158] 


[159] 


References 615 


Zarei, S., Survival Distributions and Analytic Laws of Mortality, Master Thesis, San 
Diego State University, 2006. 


Zilcha, I. and Chew, S.H., Invariance of the efficient sets when the expected utility 
hypothesis is realxed, Journal of Economic Behavior and Organization, 13, 125- 
131, 1990. 


The Casualty Actuarial Society and the Canadian Institute of Actuaries, Exam 3, 
Fall 2003, http://casact.org/admissions/studytools/exam3/. 


The Casualty Actuarial Society and the Canadian Institute of Actuaries, Exam 3, 
Fall 2004, http://casact.org/admissions/studytools/exam3/. 


The Casualty Actuarial Society and the Canadian Institute of Actuaries, Exam 3, 
Spring 2004, http:-//casact.org/admissions/studytools/exam3/. 


The Casualty Actuarial Society and the Canadian Institute of Actuaries, Exam 3, 
Spring 2005, http:-//casact.org/admissions/studytools/exam3/. 


The Society of Actuaries, Exam 3, Fall 2004, 
http://www. soa.org/STATIC/examinations.html. 


The Society of Actuaries, Exam 3, Fall 2005, 
http://www.soa.org/STATIC/examinations.html. 


The Casualty Actuarial Society and the Canadian Institute of Actuaries, Exam 3L, 
Fall 2011, http://www.casact.org/admissions/studytools/exam3/. 


The Casualty Actuarial Society and the Canadian Institute of Actuaries, Exam 3L, 
Spring 2012, 
http://www. casact.org/admissions/studytools/exam3/. 


The Casualty Actuarial Society and the Canadian Institute of Actuaries, Exam 3L, 
Fall 2012, http://www.casact.org/admissions/studytools/exam3/. 


The Casualty Actuarial Society and the Canadian Institute of Actuaries, Exam 3L, 
Spring 2013, http:-//www.casact.org/admissions/studytools/exam3/. 


This page intentionally left blank 


Answers to Exercises 


Possible additional remarks and errata will be posted in http-//actuarialtextrotar.sdsu.edu 


CHAPTER 0 


3. 
4. 


ee 


. (a) E{Y} = 5, Vary } = uas 2. (b) E{Y} = pa, Var{Y } = pa. 


E{Z3|Zi} = (1—-Zj)°/S. 


(b) fx,)s(x|5) = w Cu (1 aa ; E{X |S}= 4S. 


. E{Z3|Z1,Zo} = (1 — Z? — Z3)?/5. 


f(y|x) =2(a+y)/(2x+1), E{Y |X} = zx Atx =0. 


. f(y|x) = 3 for 0 < y < x, E{Y |X} =2X/3. 


mı +m 


(c) E{Y} =h/2, Var{Y} ae P(Y =0) = 1 (1 —e™™) and P(Y = 1) 
1 (1—e™(1+A)). 


CHAPTER 1 


2. 
. (a) [0,0.7), [0.8,1]. (b) m > 1; Y < Yo ~ 0.8. (c) m > 1, Y< 1 — Yo ~ 0.2 (d) y > 0.5. 
. (a) [0,0.1) U[0.4,0.7) U [0.8,1]. (2) y < 1. 

. The investment into one asset is more profitable for y > Yo ~ 0.68. 

. No; 4. 

. 0.183. 

. (c) 25. (d) 8.05. (e) 50.7. 

. (a) ~ 0.31. 

. No. 


3. 


. The first. 

. Worse. 

. Figure (a). 

. (b) For example, (0.2,0.5,0.3). 

. (c) For example, (0.1,0.3,0.3,0.3). 

. (0.140.1t,0.5 —0.2r,0.4+0.18) for —1 < t < 2.5. 


617 


618 


50. 


52. 
55. 
58. 
60. 
61. 


Answers to Exercises 


(a) Yes. (b) G, ip 1) is better than (2, i> ib): (c) No. However, this does not mean 
that Fred is a risk averter. 

For example, (4, B). 

c & 0.433. 

5(1+2d—V1+2d). 

(0. 1582. 

(a) ~ 0.69m; (b) ~ 0.59m. 


CHAPTER 2 


. (a) No if the number is positive. If the number is negative, then the c.v. changes the 


sign. (b) No. (c) Exponential. (d-ii) ~ 0.246. 


. Forallc <a. 
. (c) Heavy-tailed. 


Only a. 


. (a) 1.5, 97.75. (b) 1.354, 83.905. (c) 1.35, 83.2275. 


. (a) = 0.267. (b) ~ 0.0134. 

. (a) ~ 57.4%. (b) = 0.918. (c) $0.82, $3.96. 

. œ 106.3. P 

. b) E{Y} = Ze “(1 +ad), E{Y?} = Fe re (1 +.ad + (ad). 
Ə 3.12 

-@ a D0 +a © a0 447312)? 

Pij praja r a 

. (a) fs(x) = 2(e™ —e7™). 


(b) fs(x) =F if O<x <1, fox) =}if1<x<2, fox) = 33-2) if2<x<3. 


. (a) 4. (c) & 0. 567, ~ 0.433. 
. (a) 0.5. (b) ~ 0.303. 


26. ~ 0.187. : : 

27. (b) = 0.772. ©) C1 =, Q= f. 

33. Figure (b). 

38. ios = 1 and P(X =3) = 0.2. 

42. f(x) = La a5 4 444 e ™ 
; E , 6 30 6 i 

46. Yes. 

47. (a) 0 ~ 0.105. (b) 0.074. (c) 0.113. 


. (b) For the monetary unit equal to $1000, the expected value for a separate policy is 


0.05, the variance is 0.0975. For the total payment E{S} = 100, Var{S} = 195. 
(d) 0 ~ 0.229. (e) 0 > 0.281. (f) 0 ~ 0.162. 


. (a) 8 > 0.063. (b) Ono =~ 0.049, 8, ~ 0.058. The premium for the first portfolio 


~ 1.058, for the second ~ 1.116. 


° (a) Oheuristic z~ 0.088. (b) 6 > 0.119. 
. (a) 2255. 


. (a) E 


56. 
57. 


60. 


61. 


Answers to Exercises 619 


1000 2 ~_ _ 200 
v= > a 32573. 
~ 0.0665 z. 
(1) For the expected value principle, m(X) = n(Y) = 1 + 


(2) For the variance principle, m(X) = 1 +À, n(Y) = 1 + A 

(3) For the standard deviation principle: n(X) = 1 +À, 7 MK )=1+4A/V3. 

(4) For the mean value principle, 1(X) = v2, n(Y) = a 

(5) For the exponential principle, n(X) = pin is (we consider only B < 1), while 
n(Y) = gin ( om): In this case, 1(X) > n(Y). 

(6) For the Escher principle, 1(X) = 2, and n(Y) = -4 © 1.16. In this case, 1(X) > 
n(Y). 

Indeed, in this case, the equation (4.10) is equivalent to E{X%} = P®, which leads to 
(4.3). 


CHAPTER 3 


. For the same premium, the joint insurance is more profitable. 
. (a) A= 18.92, ĝg„ ~ 0.02695. (b) P(N = 15) ~ 0.066, P(N < 15) ~ 0.2200. 


(c) Fifteen claims or less happen approximately once each five years. 


. E{A|N = 6) © 5.0598. 

. The mean of A given N = 5 is 4.2. The probability under consideration is 0.762. 
. (a) ~ 245.62. (b) Not smal. (c) ~ 0.159. 

. (a) 2, 3. (b) P(N < 4) = X$, C1”) (G) G)”. This probability is not small. 

. = 0.994. 

. (a) Yes, if pı #0. 


m 


20. For the zero-truncated version, po = 0 and we deal with the geometric distribution. 
21. ~0.117. 

22. (a) 425, 1275. (b) 74°; $1495, (c) 588; 89964. 

24. < 16.54. 

27. (a) = 0.0047, œ~ 0.0444, ~ 0.0720. (b) e™!?; ~ 0.00038. 

30. P(S = 0) = 0.01; fs(x) = (0.18 +0.81x)e™* for x > 0. 

31. 45. 

32. (b) 400, 1600. 

33. (a) © 0.5375. (b) 0.58356. (d) 13,000; 3,700,000; exp {10e! + 40e? — 50}. 
34. (b) 229%; 29.105. (c) £, 8, 20. 

35. E{S} = 11,500, Var{S} = 2,766,666.6. 

36. E{S} = 14,000, Var{S} = 4, 166, 666.6. 


- (0) 0% apy) 


(c) 


oz 
wate} ( =) ; 


a+b 
and c = (1 +0) 42. 


620 Answers to Exercises 


= 


a TIERE 


and c = (1+98)e a+b?/2 Mp) 


40. (b) O = gg,— 
(c) 


= 


41. 0 > cx 0.15. 


CHAPTER 4 


5. E{X,} = exp{b°t /2}, Var{X;} = exp{b°t (exp{b°t}— 1). 
9. (a) ~ 9.36- 10714, that is, a very small number; ~ 0.208. (b) ~ 0.082, (c) ~ 0.265. 
(d) 0.8, 0.4. 
10. (a) ~ 6.14- 1076; 0.999. (b) ~ 0.368. (c) ~ 0.857. (d) 2, 1. 
11. (a) 0.2. (b) ~ 0.132. (c) ~ 0.812. (The time unit is a day.) 
12. x E{N Ns} = M +A (t +5), Corr{N,, Nips} = 


15. È 5 , = 0.6713. 

16. (c) © 232.484. (d) + 0.317. 

18. No. Yes. Yes. 

19. ~0.135. 

20. t<c#0.206. 

21. t < c7% 1.280. 

25. (a) = 309, ~ 464. (b) ~ 0.159. (c) = 0.114. 

26. 3650; ~ 24333; ~ 0.879. 

27. 945. 

28. (a) 165. (b) u > 1.4. 

29. (a) Yes. (b) Yes. (c) Yes. (d) ~ 0.147. (e) 0.639. 
0.505 0.4 0.095 

32. For Example 1, P” = (0.465 0.44 0.095 
0.4425 0.46 0.0975 


2 2 2 2 
For Example 2, pÈ =0, p? =], pe = Px+t4x+t+1 t 4x+t, pe = Px+t* Px+t+1- 
33. For Example 1, 0.18; for Example 2, Px+t* qx+t+1; for Example 4, p? (1 — p)’. 


35. ~ 6.6795, x 2. 


t m m 
me (DET Gs a2 


36. ~ 18.809. 

38. + 9.13. 

41. =~ 13, ~ 58.33. 
44. 2, 12.93. 

49. ~ 0.189. 


50. ~ 1.14, ~ 0.159. 
51. ~ 0.038,% 0.124, ~ 0.443, + 0.269, ~ 0.126. 


Answers to Exercises 621 


CHAPTER 5 


. ~ 0.921. 

. v5. 

. Z 0.52. 

. ~ 0.670. 

. Any g(t) such that g(t) t7!/? — 0, 

. = 0.683. 

. The probability that at least one student will loose is ~ 0.673. This probability is 


large, so the professor should warn student about this. 
i> he, 21 


° 2? 16°? 256° 
. 0.393. 
. No. 


. The price will never drop by 10% with probability ~ 0.608. It will happen within 


one year with probability ~ 0.277. 


. = 0.667. 


CHAPTER 6 


9. 
10. 


11. 


13. 
14. 


15. 
16. 
18. 


22. 
23. 
25. 


14+ /1+8(1+6) 

4(1 +90) 
(a) ~ 0.005. (b) u > 57.62. (c) 0 ~ 0.234. 
(a) e + 2e* +e® = 4+ 13.2z. (b) No. (c) y€ [0.59,0.6], while approximation 
(2.2.25) gives y ~ 0.063. (d) u > 59.5. (e) y € [0.06,0.061]. The rest is practically 
the same since the solution does not differ much from the previous. 
y € [1.27,1.28]. Approximation (2.2.15) leads to 1.2. 
0 = 125, 
(b) If P(X =a) = 1, then F; is the distribution uniform on [0,a]. (c) fi (x) = $81 (x) + 
1782(x) + f g(x), where gı (x), g2(x), and g3(x) are the uniform densities on (0, 1], 
[1,2], and [2,3], respectively. 
y= 20 =3/7. 
(b) P(u) © 0.937026, 
u=6,...,11. 


ra 


CHAPTER 7 


4. 
5. 


6. 


No, because this function is not monotone. 

A solution exists if the original probability multiplied by 1.1 does not exceed one. 
A solution is not unique. One of examples of a new mortality force is the force that 
is In1.1 less than the original mortality provided that such a change does not lead to 
negative values). 


s(x) = t 


622 


Answers to Exercises 


zad- fine} e fax tye! (oaa) e 


For & = 1, the distribution is uniform on [0, œ]. 


. s(x) = (1— ž)” for x < @ and a > 0. 


11. (a) 0.125. (b) ~ 0.674. 

14. ~ 0.0956. 

15. 75; ~ 0.327; ®. 

16. (a) (i) If we consider just the ratio of the expected values of male and female sur- 
vivors, then the estimate of the proportion is approximately 1 : 1.06. (ii) ~ 0.17. 

17. E{L£(60)} ~ 63.25; Var{ £(60)} ~ 23.25. 

19. ~0.94. 

20. The latter. 

22. P(T (20) >t) © 0.27e~*/ + 0.73¢71/80, 

23. ~ 32.28 

24. 2. 

29. 4900. 

31. The answer to the second question is positive. 

33. 40 P40 © 0.54; u(20) © 0.000932; j1o)10¢40 © 0.06; 2140 = 0.00032, 21420 0.000932. 

35. a ~ 12.2; about 6% of the people survived 90 years, and 1.2% of the total group of 
100, 000. 

37. For s(60.5), we have ~ 0.827496 or 0.8275; for s(61.3), we have ~ 0.823497 or 


39. 
40. 
41. 


43. 


44, 
46. 


47. 


48. 
49. 
55. 
57. 
58. 
61. 
63. 


0.8235. For (60.5), we have ~ 0.00604231 or 0.0060422, so the difference is neg- 
ligible. Next, po60.25  0.9939667. 

A 0.00151, B ~ 0.0000215, a = 0.142. 

For t > 1/(V2-1). 

2195) = 0.60536. PJ =1|2 < T <3) =0.7. 

31970 © 0.115; Var{ DP} = 1.50144; P(D') = 3) ~ 0.1649, while the Poisson ap- 
proximation gives ~ 0.1662. 

The estimates for qi k 

fri(t, i) =e IS. L; fr (t) =0. 15e70-15t. : P(J =j) = L, 

file =g e gk rOle PU) 
0.175; P(J = 2) ~ 0.825. 

10px © 0.544; P(J = 1) = 5. 

10 Px ~ 0.273; P= 1) = 0.303. 

= 0.9053. 

20P50:40 © 0.577; 50P50:40 = 03 20P5p.a9 © 9-946; soPso.ap © 0.408. 

20P50:40 © 0.7236, sopsp-gp © 0.9826. 

= 0.426. 

15925:20 % 0.02225. In the case of the absence of common shock, 15q425:29 would be 
~ 0.01489, that is, about 30% less. Next, 1545559 ~ 0.00752. 


CHAPTER 8 


2. 


(a) Yes. (c) When delta is close to zero or/and x is large. 


Answers to Exercises 623 


12° 
. (b) Agog © 0.855. 
0,25. 
. =œ $53,000. 
. P(Z < x) =x"/®. The distribution is uniform if u = 8. 
. (a) © 0.267. (b) ~ 0.262. 
. (a) Non-positively correlated. (b) Non-positively correlated. 


u—sw 
l-s°* 


» (a) Ag9.35) © 0.386; Al -~ 0.054, and 35/430 © 0.146. (b) ~ 0.395 per $1 of benefit. 


30:35] 


. Aap © 0.223, Var{Z} ~ 0.076. 

. 0.032. 

. Aso.ag © 0.545; Asoo © 0.275. 

. 0.2002. item[ 28.] 0. 

. None. item[ 33.] (a) All, under the condition P(0 < ¥ < œ) > 0. item[ 34.] Under- 


estimate. item[ 36.] No. 


. 2 = 028 for ô > 0.01. 
. (a-i) (A) 60 = (1 — (408 + 1)e~4%) / (4087). 


(a-ii) (A)oo =u/(u +8). 
(a-iii) © 5.50, © 7.45. (b-i) (DA) 69.39 = (208 — 1 + e775) /(4087). 
(b-ii) (DA) 69.59 = u[20(u + 8) — 1 +e 29) J(u + 8)?. 


. = 0.038. 

. + 0.839. 

. = 7.070. 

. APV = 0.93677, the probability under discussion is 0.06. 
. © 19.44. 

. ~ 0.580. 

. 5.744. 

. Axy & 0.965; Var{Z} ~ 0.000775. 
. (IA) x © 0.342; Var{Z} ~ 0.048. 
. © 2.8681. 

» (i) zaz GD © 0.347. 


57. (a) ~ 0.853. (b) © 1.415. (c) 0.281 (c1 +3) + 0.145(c2 + c4). 
CHAPTER 9 
1. ~ 41.957; ~ 26.292. 
6. (a) Y is uniform on [0,1/8]. (b) The d.f. Fy (x) = 1 — (1 — 8x)! for x < 1/8. 
7. P(Y <x) = -P0 for x < 1/8. 
9. Yes. 
10. No. 


624 Answers to Exercises 


20. (a) All. b) As 8 + 0, 


fo} 
lima, = ex, 
limd, = ex +1, 
(0) (0) 


lim axm = x —nPx' ex+n, 


lim äx:m — ex + nx — nPx* @x+n; 


O 
lim m\4x = mPx* ĉx+m, 


lim m|Ëx = mPx(e€x+m + 1). 


25. Aiz ~ 0.355; 20px ~ 0.836. 

26. 0.966. 

27. c œ~ $31,866. 

28. (a) ~ 58.82. (b) ~ 1670.38. 

29. For $100,000 as a unit of money, (a) ~ 27.07, ~ 57.24. (b) ~ 20.77, ~ 34.35. 


30. (1 +2008? — 208 — e~?%) / (2008°). 


31. =~ 92,300, ~ 9,080,000. 

32. © 21.983. 

33. © 18.75. 

35. (a) © 2.132. (b) © 5.78. 

40. (a) = $127,200. (b) ~ $132,600. (c) $69,900. 
41. 2.84c +0.66. 

43. (a) = 10.81; ~ 17.58. (b) = 9.98; ~ 16.74. 
44, dey — Ay. 

45. u/[(ut+8)(2u+8)]. 


46. o/|(u1 +8) (u1 +o +8)]. 


625 


Answers to Exercises 


CHAPTER 10 


2. The results are given in the table below. 


Whole life insurance with a benefit payable 


P, and P( Ax) 
at the moment of death 


Whole life insurance with a benefit payable 


P(A,) and P, 
at the end of the year of death 


y, ae ; S Ži 
n-Year term life insurance with a benefit Plaan d P( Ala) 
payable at the moment of death 


n-Year endowment life insurance 


Pm and P(A,. 
with a benefit payable at the moment of death a (Axm) 


m-Year payment whole life insurance 


P, and „P(A 
with a benefit payable at the moment of death es mP(Ay) 


m-Year payment n-year term life insurance 


P! and mP (Aga) 
with a benefit payable at the moment of death f 


m? x:n] 


m-Year payment n-year term endowment 
life insurance with a benefit payable 
at the moment of death 


mem and mP (Axm) 


n-Year pure endowment life insurance P 


n-Year deferred whole life annuity-due 


5. The results are given in the table below. 


Whole life insurance with a benefit payable 


P=1/e,, P=1 1 
at the moment of death | es, [lest 1) 


Whole life insurance with a benefit payable 


P=1/e,,P=1 1 
at the end of the year of death / ex, Mo) 


n- Year term life insurance with a benefit 
payable at the moment of death 


P= nqx/ (Ex —nPx Ca 
P= nqx/ (ex + nx — nPxex+n) 


n-Year endowment life insurance 
with a benefit payable at the moment of death 


P= 1/(ex —nPx extn), 
P 1/(ex+ nqx — nPxex+n) 


m-Year payment whole life insurance 
with a benefit payable at the moment of death 


/ 
/(ex —mPx exim), 
/ (ex T mdx — nPxex+m) 


626 


6. nP = a + r Axtn = = =~ aid +P tAytn- 
äx:m dyn ax: xn] 
7. 3/4. 
8. The second. 
9. (a) P59 = 0.0190. (b) P(Ln < 0) ~ 0.747. 
10. P} aq © 0.013. 
11. P = 16.657. 
12. (a) Py-y © 0.050. (b) Pry ~ 0.013. (c) P ~ 0.0196. 


13. Var{Lp} = (1 —e7*) (s+se~* —2+2e-*) /[2(e7 
15. n > 1580. 
16. P~ J ope ~P, ge as C — 9, 
aß "2R 
17. > 13.3%. 
18. The premiums are smaller for the first group. 
19. v(1—q). 
20. k > 17/15. 
21. ~ 0.0186. 
22. (a) $7407 per year. (b) $7279 per year. 


Answers to Exercises 


m-Year payment n-year term life insurance 
with a benefit payable at the moment of death 


P= Adal (€x — mPx esim) 


P= nqx/ (ex + m4x — nPxex+m) 


m-Year payment n-year term endowment 
life insurance with a benefit payable 
at the moment of death 


P= 1 /(ex — mPx Ct) , 
P= 


1/(ex + mQx — nPxex+m) 


n-Year pure endowment life insurance 


Pp nPx/(@x —nPx eiin), 
/ ex + ndx — nPxex+n) 


n-Year deferred whole life annuity-due 


( 
P= nPx* (Cxtn+1)/ (ex 
P= apx: (ex4n +1)/(ex 


—nPx Extn), 
+ nx — nPxex+n) 


Pe oe Al. 


< (© © 0.080. 


. (a) oV = 0, iV x 0.2647, 2V ~ 0.1364, 3V = (0: 
(b) oV = 0,1V ~ 6.4239, 2V ~ 12.9054, 3V = 20. 


(c) oV =0, 1V + 0.0098, 2V + 0.01. 


5_1+5)]. 


Answers to Exercises 627 


CHAPTER 11 


10. 


11. 
12. 


. (a) A quadratic function. (b) (i) 40 years later. (ii) No solution. 
. (a) ~ 65.65. (b) 40.55 years later, which is too much. However, if the participants 


had been twenty years younger, it would be realistic. 


. (a) 0. A hyperbola. (b) sy = b(10 +x), where b is an arbitrary multiplier on which 


the ratio s,+, if Sx does not depend. 


0.815 S? ypt ay” Ody. 
. (a) vip B(x,h,r —x)ā,. (b) $224,385. 


(r) (r) 


. An additional condition should concern ux’ (y). For example, we may require ux’ (y) 


for the first P n to be not smaller then for the second for all y’s. 


. (a) e HO) fe CB(x,h, y)dy. (b) ~ 8014. 


2+8 


. (a) woePX. This is just the salary at the moment of retirement, and we take into 


account only this salary once the corresponding weight y approaches infinity. 
cw(0)Ke®. 
The moment when the remaining income reaches the level w is 


The problem makes sense if f < K. The accumulated capital 
K z5oPK 
C= L G + (1 — c) e7 (P-P) ae ef) - < (F — en) > 68) 


Any expression of the type of e“ may be also written as 


( ") < a o i 


though probably it makes sense to leave (3) as is. 
(a) Expression (1.2.4), as a function of d, is non-increasing. (b) $25,789. 
(a) wo(Ta — T;). 


(b) ono _ (o+) (1 = eee?) 


o+u—B 

13. é,. 
14. (a) Not necessarily. 
18. (a) In(1+B) +u(x) +ô. (b) B+ u(x) +6. 
19. (NC)144 = NG 

kw(r)(r—a)e~ OTH (r-a kw(r)(r—a 8 r—x) 1—eT Hr) 
20. ( a and ae ) : e~l +u)( J1- e 
CHAPTER 12 

1. Var {Ri a(10) (X) } ~ 1.38, Var {Rij (X1)} ~ 0.33, and hence 10Var{ Ri) (X1)} re 3.3. 


The an ee reinsurance gives the same result as the excess-of-loss reinsurance on 
the average, but the riskiness of the former is less than that of the latter reinsurance. 


Answers to Exercises 


. (a) So > 88,192. If 6 = 90,000, then rı ~ 0.953. 


(b-ii) Vretained © 2k?m/(1 — q). (b-iii) Vretainea & 0.55. For the second question, q < 
0.048. 


< Yn = mO2,,,,/(267A). 


reins 


. (a) ro = 3, Yra © 0.106a. 

. (b) ro = 0.33. 

. (d) V = (k! (1 — k)! )E{S!/3} for 0.453 < k < 0.547. 

. E{Z;} =2-+c;, where cy = —1, c2 = 0, c3 = 1. 

. (b) 7r +372 = 5. 

. (a) G(X;) =m+ È, r; = 1. (b) G(X) =m + + (02(1 — p) + pois), where ns = 


nm? ms 


Li Of 


Actuarial Models: The Mathematics of Insurance, Second Edition 
thoroughly covers the basic models of insurance processes. It also presents the 
mathematical frameworks and methods used in actuarial modeling. This second 
edition provides an even smoother, more robust account of the main ideas and 
models, preparing readers to take exams of the Society of Actuaries (SOA) and 
the Casualty Actuarial Society (CAS). 


New to the Second Edition 

e Revises all chapters, especially material on the surplus process 

e Takes into account new results and current trends in teaching actuarial 
modeling 

e Presents a new chapter on pension models 

e Includes new problems from the 2011-2013 CAS examinations 


Like its best-selling, widely adopted predecessor, this edition is designed for 
students, actuaries, mathematicians, and researchers interested in insurance 
processes and economic and social models. The author offers three clearly 
marked options for using the text. The first option includes the basic material for 
a one-semester undergraduate course, the second provides a more complete 
treatment ideal for a two-semester course or self-study, and the third covers 
more challenging topics suitable for graduate-level readers. 


Features 

e Gives a comprehensive, self-contained exposition of the basic 
mathematical models of insurance processes 

e Contains the standard material taught in actuarial modeling courses 

e Covers several advanced topics, such as the modern theory of risk 
evaluation, cash flows in the Markov environment, and more 

e Includes real problems from past CAS exams to prepare readers for their 
actuarial exams 

e Provides many computational examples using Excel to illustrate the results 

Solutions manual available upon qualifying course adoption 


Access online or download to your smartphone, tablet or PC/Mac 


Search the full text of this and other titles you own 

Make and share notes and highlights = 
Copy and paste text and figures for use in your own documents 

Customize your view by changing font size and layout 


Ke2520 
ISBN: 978-1-4822-270b-2 


- Pre A 6000 Broken Sound Parkway, NW 
ress Suite 300, Boca Raton, FL 33487 
Taylor & Francis Group | 711 Third Avenue 
an informa business New York, NY 10017 
2 Park Square, Milton Park 
B Press cem Abingdon, Oxon OX14 4RN, UK 9 "781482227 


| 90000 


M 


